Fast Sparse Matrix-Vector Multiplication On Gpus For Graph Applications

SC(2014)

引用 174|浏览168
暂无评分
摘要
Sparse matrix-vector multiplication (SpMV) is a widely used computational kernel. The most commonly used format for a sparse matrix is CSR (Compressed Sparse Row), but a number of other representations have recently been developed that achieve higher SpMV performance. However, the alternative representations typically impose a significant preprocessing overhead. While a high preprocessing overhead can be amortized for applications requiring many iterative invocations of SpMV that use the same matrix, it is not always feasible - for instance when analyzing large dynamically evolving graphs.This paper presents ACSR, an adaptive SpMV algorithm that uses the standard CSR format but reduces thread divergence by combining rows into groups (bins) which have a similar number of non-zero elements. Further, for rows in bins that span a wide range of non zero counts, dynamic parallelism is leveraged. A significant benefit of ACSR over other proposed SpMV approaches is that it works directly with the standard CSR format, and thus avoids significant preprocessing overheads. A CUDA implementation of ACSR is shown to outperform SpMV implementations in the NVIDIA CUSP and cuSPARSE libraries on a set of sparse matrices representing power-law graphs. We also demonstrate the use of ACSR for the analysis of dynamic graphs, where the improvement over extant approaches is even higher.
更多
查看译文
关键词
SpMV,GPU,CSR,HYB,ACSR
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要