Tree cutting approach for domain partitioning on forest-of-octrees-based block-structured static adaptive mesh refinement with lattice Boltzmann method

PARALLEL COMPUTING(2021)

引用 1|浏览7
暂无评分
摘要
The aerodynamics simulation code based on the lattice Boltzmann method (LBM) using forest-of-octrees-based block-structured adaptive mesh refinement (AMR) with temporary-fixed refinement was implemented, and its performance was evaluated on GPU-based supercomputers. Although the Space-Filling-Curve-based (SFC) domain partitioning algorithm for the octree-based AMR has been widely used on conventional CPU-based supercomputers, accelerated computation on GPU-based supercomputers revealed a bottleneck due to costly halo data communication. Our new tree cutting approach adopts a hybrid domain partitioning with the coarse structured block decomposition and the SFC partitioning in each block. This hybrid approach improved the locality and the topology of the partitioned sub-domains and reduced the amount of the halo communication to one-third of the original SFC approach. In the strong scaling test, the code achieved maximum x1.82 speedup at the performance of 2207 MLUPS (mega-lattice update per second) on 128 GPUs (NVIDIA (R) Tesla (R) V100). In the weak scaling test, the code achieved 9620 MLUPS at 128 GPUs with 4.473 billion grid points, while keeping the parallel efficiency of 93.4% from 8 to 128 GPUs.
更多
查看译文
关键词
Adaptive mesh refinement (AMR), Static AMR, Lattice Boltzmann method, GPU
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要