Calculation of cross-correlation function accelerated by TensorFloat-32 Tensor Core operations on NVIDIA's Ampere and Hopper GPUs.

J. Comput. Sci.(2023)

引用 2|浏览13
暂无评分
摘要
The cross-correlation function appears in many fields with time-series data, and speeding up the computation is essential given that the availability of massive observational data has increased with the improvement of measurement technologies. The cross-correlation function can be calculated as a matrix-matrix product, and a significant speed-up can be expected utilizing Tensor Cores, which are matrix-matrix product acceleration units of recent NVIDIA Graphics Processing Units (GPUs). In this research, we target the TensorFloat-32 Tensor Core operations, which are available in the Ampere, Ada, and Hopper architectures. We develop a fast calculation method considering the characteristics of the cross-correlation function and Tensor Cores that increases arithmetic intensity and improve upon the performance of the matrix-matrix product baseline. On the A100 40GB SXM GPU, our method achieved a high performance of 54 TFLOPS in the performance measurement assuming seismic interferometry using actual data. Furthermore, with data-access latency hiding algorithms, we obtained performance of 191 TFLOPS on the H100 PCIe GPU, corresponding to a further 3.6-fold speedup from the proposed method without latency hiding on the A100 40GB SXM GPU. The accuracy of the calculation result is sufficient compared to the 64-bit floating-point calculation, indicating the applicability of Tensor Core operations using TensorFloat-32 for scientific calculations. Our proposed method is expected to make it possible to utilize a large amount of data more effectively in many fields. This paper is an extended version of ICCS conference paper (Kikuchi et al., 2022) with extensions on the data-access latency hiding algorithms and results for H100 GPUs.
更多
查看译文
关键词
tensorfloat-32 core operations,hopper gpus,nvidias,cross-correlation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要