A Technique for Improving Performance of Moderately Sparse Matrix Algorithms

Shashank Adavally,Krishna Kavi,Nagendra Gulur

semanticscholar（2020）

引用 0|浏览0

暂无评分

摘要

We seek answers to the following: (i) at what sparsity levels is it worth eliminating compressed representation of matrices and use dense representation of data that include both zeros and non-zero values and (ii) even if we use compressed data representation, will it be useful to expand the matrices internally to achieve high degree of parallelism. In this paper we explore the second question using a specialized load/store unit (LSU). Our LSU expands sparse matrices into dense matrices by filling rows (or columns) with zeros as needed, allowing for high degrees of parallelism (such as SIMD). The computational elements use dense matrix algorithms and perform no index computations. We explore the solution within the context of Processing-in-Memory (PIM) where several simple processing elements are included within the logic layer of a 3D-stacked memory. Our studies shows more than 30 percent speedup in performance and 80 percent power savings for sparse matrix multiplication over a baseline consisting of conventional multicore CPUs using sparse matrix programs and these gains are possible when the number of non-zero elements is greater than 30 percent (or sparsity less than 70 percent).

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要