Optimizing the Gravitational Tree Algorithm for Many-Core Processors

Monthly Notices of the Royal Astronomical Society(2023)

引用 0|浏览2
暂无评分
摘要
Gravitational $N$-body simulations calculate numerous interactions between particles. The tree algorithm reduces these calculations by constructing a hierarchical oct-tree structure and approximating gravitational forces on particles. Over the last three decades, the tree algorithm has been extensively used in large-scale simulations, and its parallelization in distributed memory environments has been well studied. However, recent supercomputers are equipped with many CPU cores per node, and optimizations of the tree construction in shared memory environments are becoming crucial. We propose a novel tree construction method in contrast to the conventional top-down approach. It first creates all leaf cells without traversing the tree and then constructs the remaining cells by a bottom-up approach. We evaluated the performance of our novel method on the supercomputer Fugaku and an Intel machine. On a single thread, our method accelerates one of the most time-consuming processes of the conventional tree construction method by a factor of above 3.0 on Fugaku and 2.2 on the Intel machine. Furthermore, as the number of threads increases, our parallel tree construction time reduces considerably. Compared to the conventional sequential tree construction method, we achieve a speedup of over 45 on 48 threads of Fugaku and more than 56 on 112 threads of the Intel machine. In stark contrast to the conventional method, the tree construction with our method no longer constitutes a bottleneck in the tree algorithm, even when using many threads.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要