A Fast Sparse Triangular Solver For Structured-Grid Problems On Sunway Many-Core Processor Sw26010

PROCEEDINGS OF THE 47TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING(2018)

引用 7|浏览102
暂无评分
摘要
The sparse triangular solver (SpTRSV) is one of the most essential kernels in many scientific and engineering applications. Efficiently parallelizing the SpTRSV on modern many-core architectures is considerably difficult due to inherent dependency of computation and discontinuous memory accesses. Achieving high performance of SpTRSV is even more challenging for SW26010, the new-generation customized heterogeneous many-core processor equipped in the top-rank Sunway TaihuLight supercomputer. Owing to regular sparse pattern, structured-grid triangular problems show much different computing characteristics with general ones as well as new opportunities to algorithm design on many-core architectures, which ever lacks attention. In this work, we focus on how to design and implement fast SpTRSV for structured-grid problems on SW26010. A generalized algorithm framework of parallel SpTRSV is proposed for best utilization of the features and flexibilities of SW26010 many-core architecture according to the fine-grained Producer-Consumer model. Moreover, a novel parallel structured-grid SpTRSV is presented by using direct data transfers across registers of the computing elements of SW26010. Experiments on four typical structured-grid triangular problems with different problem sizes demonstrate that our SpTRSV can achieve an average momory bandwidth utilization of 79.7% according to the stream benchmark, which leads to a speedup of 17.7 over serial version on SW26010. Furthermore, experiments with real world sparse linear problems show that our proposed SpTRSV can achieve superior preconditioning performance over the Intel Xeon E5-2670 v3 CPU and Intel Xeon Phi 7210 KNL over DDR4 memory.
更多
查看译文
关键词
Sunway TaihuLight, Triangular Solver, Structured Grid
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要