CC-RRTMG_SW++: Further optimizing a shortwave radiative transfer scheme on GPU

The Journal of Supercomputing(2022)

引用 0|浏览2
暂无评分
摘要
mospheric radiation is one of the most important atmospheric physics, and its expensive computation cost severely restricts the numerical simulation of atmospheric general circulation models. Therefore, it is necessary to study an efficient radiation parameterization scheme. Due to the powerful computing power of GPU, more and more numerical models are being transplanted to GPU. The CUDA C version (CC-RRTMG_SW) of the rapid radiative transfer model for general circulation models (RRTMG) shortwave radiation scheme (RRTMG_SW) has successfully run on GPU, but its computing efficiency is not yet very high, and the performance potential of GPU computing needs to be realized further. This paper is dedicated to optimizing CC-RRTMG_SW and exploring its maximum computing performance on GPU. First, a three-dimensional acceleration algorithm for CC-RRTMG_SW is proposed. Then, some optimization methods, such as decoupling data dependency, optimizing memory access, and I/O optimization, are studied. Finally, the optimized version of CC-RRTMG_SW is developed, namely CC-RRTMG_SW++. The experimental results demonstrate that the proposed acceleration algorithm and performance optimization methods are effective. CC-RRTMG_SW++ achieved good acceleration effects on different GPU architectures, such as NVIDIA Tesla K20, K40, and V100. Compared to RRTMG_SW running on a single Intel Xeon E5-2680 v2 CPU core, CC-RRTMG_SW++ obtained a speedup of 99.09 × on a single V100 GPU without I/O transfer. Compared to CC-RRTMG_SW, the computing efficiency of CC-RRTMG_SW++ increased by 174.46
更多
查看译文
关键词
Graphics processing unit,Compute unified device architecture programming,Performance optimization,Shortwave radiative transfer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要