On The Gpu-Cpu Performance Portability Of Opencl For 3d Stencil Computations

ICPADS '13: Proceedings of the 2013 International Conference on Parallel and Distributed Systems(2013)

引用 3|浏览31
暂无评分
摘要
Although OpenCL programming provides full code portability between different hardware platforms, performance portability can be far from satisfactory. In this work, we use a set of representative 3D stencil computations to study OpenCL's performance portability between GPUs and CPUs. For each stencil computation, we have devised different implementations of the computational kernel function, all being 100% code-portable between the two architectures. The most straightforward and compact implementation gives satisfactory CPU performance but performs poorly on GPUs, because such an implementation hampers effective use of the GPU hardware. By injecting code complexity into the involved loop nests, we can create kernel functions that still have full code portability but with increased performance portability. It is found that spatial data blocking and register reuse can be beneficial for performance on both GPUs and CPUs, whereas use of OpenCL's local memory (and subsequent temporal blocking) may only have positive effects on GPUs.
更多
查看译文
关键词
OpenCL,3D stencil computation,performance portability,GPU-CPU
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要