On the GPU Performance of 3D Stencil Computations Implemented in OpenCL.

Huayou Su,Nan Wu,Mei Wen,Chunyuan Zhang,Xing Cai

Lecture Notes in Computer Science（2013）

引用 10|浏览16

暂无评分

摘要

Aiming at a close examination of the OpenCL performance myth, we study in this paper OpenCL implementations of several representative 3D stencil computations. It is found that typical optimization techniques such as array padding, plane sweeping and chunking give similar performance boosts to the OpenCL implementations, as those obtained in corresponding CUDA programs. The key to good performance lies in maximizing the use of on-chip resources of a GPU, same for both OpenCL and CUDA programming. In most cases, the achieved FLOPS rates on NVIDIA's Fermi and Kepler GPUs are fully comparable between the two programming alternatives. For four typical 3D stencil computations, the performance of the OpenCL implementations is on average 9% and 2% faster than that of the CUDA counterparts on GTX590 and Tesla K20, respectively. At the moment, the only clear advantage of CUDA programming for stencil computations arises from CUDA's ability of using the read-only data cache on NVIDIA's Kepler GPUs. The skepticism about OpenCL's GPU performance thus seems unjustified for 3D stencil computations.

查看译文

关键词

GPU programming,OpenCL,CUDA,stencil computations

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要