Bandwidth Reduced Parallel Spmv On The Sw26010 Many-Core Platform

PROCEEDINGS OF THE 47TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING(2018)

引用 10|浏览41
暂无评分
摘要
SpMV (Sparse Matrix-Vector multiplication), in its simplest form y = Ax, multiplies a sparse matrix with a dense vector and is a widely used computing primitive in the domain of HPC. On the newly SW26010 many-core platform, we propose a highly efficient CSR (Compressed Storage Row) based implementation of parallel SpMV, referred to as SWCSR-SpMV in the sequel. SpMV in the CSR format can be trivially parallelized but its performance is majorly impeded by memory access efficiency, and therefore to leverage high-throughput memory access mechanism while avoiding redundant bandwidth usage becomes the major goal of designing high performance SpMV on the target platform. The original problem is sequentially partitioned into row-slices, each of which can reside in the fast scratchpad memory, so that the loaded x'es can be reused; meanwhile, a dynamic look-ahead scheme is applied to avoid redundant memory access; we split the many-core mesh into smaller communication scope to facilitate the sharing of the common data across the working threads via the high speed on-mesh data bus. Beyond the above, to leverage massive parallelism balanced workload is ensured by both static and dynamic means. Performance evaluation is done on a benchmark of 36 frequently used sparse matrices in the fields of graph computing, data mining, computational fluid dynamics, etc.. While the performance upper-bound is defined by the ratio between the minimal data access volume required against the practically optimal bandwidth, ignoring the computing overhead, SWcs R -SpMV can achieve an efficiency of nearly 87%, maintaining over 75% for 1/3 of the testing matrices. SWcs R -SpMV is further applied in a PETSc based application, a 1.75x-2.6x speedup is sustained in a multi-process environment on the Sunway TaiHuLight supercomputer.
更多
查看译文
关键词
Sparse Matrices, Sparse Matrix-Vector Multiplication, CSR, Parallel SpMV, SW26010, Sunway TiHuLight, Many-core
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要