rocBLAS-2.1.0
Changelist:
- Refactor rocBLAS test framework
- Improved performance of i8_r/i32_r rocblas_gemm_ex on gfx906
- Addition of simple trsv implementation using trsm
- Improved performance of trsm
- Tuning improvements for resnet50 problems
- Update tuning to use new Tensile solution selection logic
- rocblas_gemm_ex performance improvement when ldd == lcc and strideD == strideC
- Bug fixes for IAMIN and TRSV
- Add sphinx based readthedoc documentation