EC-SpMM: Efficient Compilation of SpMM Kernel on GPUs

Junqing Lin, Honghe Zhang, Xiaolong Shi,Jingwei Sun,Xianzhi Yu,Jun Yao,Guangzhong Sun

PROCEEDINGS OF THE 52ND INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2023(2023)

引用 0|浏览40
暂无评分
摘要
As deep neural networks (DNNs) become increasingly large and complicated, pruning techniques are proposed for lower memory footprint and more efficient inference. The most critical kernel to execute pruned sparse DNNs on GPUs is Sparse-dense Matrix Multiplication (SpMM). To maximize the performance of SpMM, despite the high-performance code generated from recent tensor compilers, they often take a long time for iteratively searching candidate configurations. Such a long time slows down the cycle of exploring better DNN architectures or pruning algorithms. In this paper, we propose EC-SpMM to efficiently generate high-performance SpMM kernels for sparse DNN inference. Based on the analysis of nonzero elements' layout, the characterization of GPU architecture, and a rank-based cost model, EC-SpMM can effectively reduce the search space and eliminate possibly low-performance candidates. Experimental results show that EC-SpMM can reduce the compilation time by a factor of 35x, while the performance of generated SpMM kernels is comparable or even better, compared with the state-of-the-art sparse tensor compiling solution.
更多
查看译文
关键词
Sparse-dense matrix multiplication,deep neural network,tensor compiler,GPU
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要