TileSpTRSV: a tiled algorithm for parallel sparse triangular solve on GPUs

CCF Trans. High Perform. Comput.(2023)

引用 1|浏览4
暂无评分
摘要
Sparse triangular solve (SpTRSV) is one of the most important level-2 kernels in sparse basic linear algebra subprograms (BLAS). Compared to another level-2 sparse BLAS kernel sparse matrix–vector multiplication (SpMV), SpTRSV is in general more difficult to find high parallelism on many-core processors, such as GPUs. Nowadays, much work focuses on reducing dependencies and synchronizations in the level-set and Sync-free algorithms for SpTRSV. However, there is less work that can make good use of sparse spatial structure for SpTRSV on GPUs. In this paper, we propose a tiled algorithm called TileSpTRSV for optimizing SpTRSV on GPUs through exploiting 2D spatial structure of sparse matrices. We design two algorithm implementations, i.e., TileSpTRSV_level-set and TileSpTRSV_sync-free, for TileSpTRSV on top of level-set and Sync-free algorithms, respectively. By testing 16 representative matrices on a latest NVIDIA GPU, the experimental results show that TileSpTRSV_level-set gives on average 5.29 × (up to 38.10 × ), 5.33 × (up to 21.32 × ) and 2.62 × (up to 12.87 × ) speedups over cuSPARSE, Sync-free and Recblock algorithms on the 16 representative matrices, respectively.
更多
查看译文
关键词
Sparse matrix,Sparse triangular solve,Tiled algorithm,GPU
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要