Mixed-precision block incomplete sparse approximate preconditioner on Tensor core

CCF Transactions on High Performance Computing(2024)

引用 1|浏览1
暂无评分
摘要
In this paper, we propose and implement a mixed-precision Block-ISAI preconditioner for solving linear systems from multiphysics areas. By leveraging FP32 computing, our approach accelerates the sparse matrix–vector product kernel while maintaining satisfactory accuracy. Meanwhile, an efficient, warp-based GPU implementation for Block-ISAI preconditioner with Tensor core acceleration is proposed. For the matrix-multiplication portion of it, we use the double-precision Tensor core on the NVIDIA GPUs A100 to accelerate it. To showcase the effectiveness of our method, detailed comparisons are made which shows noteworthy speedup: precisely, it is 6x faster than cuSPARSE and 11.2x faster than PETSc’s built-in preconditioner.
更多
查看译文
关键词
Block-ISAI,GPU,Mixed-precision,Tensor core,Preconditioner
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要