Efficiently Executing Sparse Matrix-Matrix Multiplication on General Purpose Digital Single Processor

2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)(2022)

引用 0|浏览11
暂无评分
摘要
Sparse Matrix-Matrix Multiplication (SpMM) is a kernel for large-scale scientific computing and deep learning. Previous SpMM optimization research focused on CPU or GPU architecture, with little attention paid to the General Purpose Digital Single Processor (GPDSP). Earlier work shows that reducing irregular memory access and increasing data reuse can improve the performance of SpMM. In this paper, we propose an effective SpMM vectorization method for GPDSP Matrix2000b. According to the memory hierarchy of Matrix2000b, we designed an adaptive sparse matrix tilling scheme and combined it with row-major compute in order to improve data reuse. Then we implemented a multi-level parallel algorithm that includes thread-level parallelism, data-level parallelism and instruction-level parallelism based on the multi-level computing components of Matrix2000b to vectorization and eliminate irregular memory access. We also designed a three-level double buffer workflow based on the memory hierarchy to overlap the computation and memory access. When memory theoretical bandwidth is 42.6 G B/s, experimental evaluation reveals that the average performance can reach 34.81 GFlops/s and the maximum performance can reach 65.47 GFlops/s.
更多
查看译文
关键词
SpMM,GPDSP,Adaptive Tilling,Vectorization,Multi-thread
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要