Optimizing Fusion PIC Code Performance at Scale on Cori Phase Two.

ISC Workshops(2017)

引用 25|浏览3
暂无评分
摘要
In this paper we present the results of optimizing the performance of the gyrokinetic full-f fusion PIC code XGC1 on the Cori Phase Two Knights Landing system. The code has undergone substantial development to enable the use of vector instructions in its most expensive kernels within the NERSC Exascale Science Applications Program. We study the single-node performance of the code on an absolute scale using the roofline methodology to guide optimization efforts. We have obtained 2({times }) speedups in single node performance due to enabling vectorization and performing memory layout optimizations. On multiple nodes, the code is shown to scale well up to 4000 nodes, near half the size of the machine. We discuss some communication bottlenecks that were identified and resolved during the work.
更多
查看译文
关键词
Fusion, Applications, Performance, Optimization, Particle in cell
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要