Segmented Merge: A New Primitive for Parallel Sparse Matrix Computations

International Journal of Parallel Programming(2021)

引用 2|浏览18
暂无评分
摘要
Segmented operations, such as segmented sum, segmented scan and segmented sort, are important building blocks for parallel irregular algorithms. We in this work propose a new parallel primitive called segmented merge. Its function is in parallel merging q sub-segments to p segments, both of possibly nonuniform lengths which easily cause the load balancing and the vectorization problems on massively parallel processors, such as GPUs. Our algorithm resolves these problems by first recording the boundaries of segments and sub-segments, then assigning roughly the same number of elements for GPU threads, and finally iteratively merging the sub-segments in each segment in the form of binary tree until there is only one sub-segment in each segment. We implement the segmented merge primitive on GPUs and demonstrate its efficiency on parallel sparse matrix transposition (SpTRANS) and sparse matrix–matrix multiplication (SpGEMM) operations. We conduct a comparative experiment with NVIDIA vendor library on two GPUs. The experimental results show that our algorithm achieve on average 3.94× (up to 13.09×) and 2.89× (up to 109.15×) speedup on SpTRANS and SpGEMM, respectively.
更多
查看译文
关键词
Parallel computing,Segmented merge,Sparse matrix,GPU
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要