Improving Performance of GPU Specific OpenCL Program on CPUs

Parallel and Distributed Computing, Applications and Technologies(2012)

引用 5|浏览0
暂无评分
摘要
OpenCL provides unified programming interface for various parallel computing platforms. The OpenCL framework manifests good functional portability, the programs can be run on platforms supporting OpenCL programming without any modification. However, most of the OpenCL programs are optimized for massively parallel processors, such as GPU, it's hard to achieve good performance on general multi-core processors without sophisticate modification to the GPU specific OpenCL programs. The major reason is the immense gap between CPU and GPU architecture. In this paper, we evaluate the performance portability of OpenCL programs between CPU and GPU, and analyse the reasons why GPU specific OpenCL programs are not fit for CPU. Based on the profiling, we proposed three optimization strategies for improving performance of GPU specific OpenCL programs on CPU, including increasing the granularity of task partition, optimizing the usage of memory hierarchy and block-based data accessing. In addition, we applied the proposed techniques on several benchmarks. The experimental results show that the performance of the optimized OpenCL programs achieve high performance in terms of speedup ratio from 2 to 4 on CPUs, when compared with their corresponding GPU specific ones.
更多
查看译文
关键词
gpu specific opencl program performance improvement,high performance,optimisation,opencl program portability performance evaluation,opencl program,block-based data access,parallel programming,improving performance,gpu specific opencl program,massively parallel processors,gpu architecture,graphics processing units,good functional portability,gpu specific,software performance evaluation,profiling,multiprocessing programs,parallel computing platforms,cpu architecture,software portability,functional portability,opencl framework,good performance,opencl programming,optimized opencl program,memory hierarchy usage optimization strategies,opencl,unified programming interface,performance portability,task partition granularity
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要