Systematic adaptation of stencil‐based 3D MPDATA to GPU architectures

Concurrency and Computation: Practice and Experience(2017)

引用 17|浏览31
暂无评分
摘要
In this work, we focus on a systematic adaptation of the stencil-based multidimensional positive definite advection transport algorithm (MPDATA) to different graphics processing unit (GPU)-based computing platforms. Another objective of this work is to compare the performance of MPDATA on several platforms, including a multi-GPU system with two NVIDIA Tesla K80 cards, and single-card platforms with Tesla K20X, GeForce GTX TITAN, and GeForce GTX 980. The usage of the following optimization methods is proposed to improve the overall performance: (i) reducing the number of operations by the subexpression elimination when implementing 2.5D blocking; (ii) reorganization of boundary conditions for reducing branch instructions; (iii) advanced memory management to increase the coalesced memory access; and (iv) warps rearrangement for optimizing the data access to GPU global memory. The presented methods of the MPDATA adaptation to GPU architectures allow us to efficiently use many graphics processors within a single node by applying peer-to-peer data transfers between GPU global memories. We propose an auto-tuning procedure to compensate architectural differences between the considered platforms. This procedure takes into account algorithm/GPU-specific parameters. The proposed approach to adaptation of MPDATA to GPU architectures allows us to achieve up to 482.5 Gflop/s for the platform equipped with two NVIDIA K80 GPUs. Copyright (c) 2016 John Wiley & Sons, Ltd.
更多
查看译文
关键词
GPU,Kepler and Maxwell architectures,stencils,MPDATA,CUDA,auto-tuning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要