Optimizing the Gravitational Tree Algorithm for Many-Core Processors
Monthly Notices of the Royal Astronomical Society(2023)
摘要
Gravitational $N$-body simulations calculate numerous interactions between
particles. The tree algorithm reduces these calculations by constructing a
hierarchical oct-tree structure and approximating gravitational forces on
particles. Over the last three decades, the tree algorithm has been extensively
used in large-scale simulations, and its parallelization in distributed memory
environments has been well studied. However, recent supercomputers are equipped
with many CPU cores per node, and optimizations of the tree construction in
shared memory environments are becoming crucial. We propose a novel tree
construction method in contrast to the conventional top-down approach. It first
creates all leaf cells without traversing the tree and then constructs the
remaining cells by a bottom-up approach. We evaluated the performance of our
novel method on the supercomputer Fugaku and an Intel machine. On a single
thread, our method accelerates one of the most time-consuming processes of the
conventional tree construction method by a factor of above 3.0 on Fugaku and
2.2 on the Intel machine. Furthermore, as the number of threads increases, our
parallel tree construction time reduces considerably. Compared to the
conventional sequential tree construction method, we achieve a speedup of over
45 on 48 threads of Fugaku and more than 56 on 112 threads of the Intel
machine. In stark contrast to the conventional method, the tree construction
with our method no longer constitutes a bottleneck in the tree algorithm, even
when using many threads.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要