Probing the Underlying Implementation Mechanisms of SW26010.

HPCC/DSS/SmartCity(2020)

引用 0|浏览9
暂无评分
摘要
SW26010 is a heterogeneous many-core CPU equipped in the Sunway TaihuLight Supercomputer, which ranks fourth in June 2020 Top500 list. Large varieties of applications have been tuned on SW26010, but only a few researches focus on system-level modeling, benchmarking. In this paper, we evaluated the threading performance on SW26010, concentrating on the underlying implementations of systematic threading and data transmission operations inside a thread, which are the two main overheads deteriorating the overall performance of heterogeneous computing. We design several benchmarks to test and verify the underlying implementation mechanisms, and present thorough analysis of the experimental results. We find that threading overhead is related to the active slave core number and is nontrivial when the data volume and flops/byte ratio are relatively low; data transmission requests are serially handled, while data is transferred in parallel with four buses, each of which is shared by two rows of 16 slave cores. In addition, given that not all slave cores are used, the interleave mode performs better than the continuous mode in both request response and data transmission. At last, several programming methods and optimization principles for SW26010 are proposed.
更多
查看译文
关键词
SW26010,Overhead,Heterogeneous Many-core,Data Transmission,Threading
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要