Automatic Mapping Single-Device OpenCL Program to Heterogeneous Multi-device Platform

HPCC/EUC(2013)

引用 3|浏览25
暂无评分
摘要
In this paper, we propose a framework to automatically map single-device OpenCL programs to heterogeneous multi-device platforms with performance concerns. Our framework is based on the independence of work groups which built inside the OpenCL programming model and relies heavily on the knowledge of global memory access regions of work groups. So global memory access patterns of work groups are analyzed and an abstract representation CCRwS is designed to describe the exact memory access regions of each memory access statement in the kernels. A global memory access analyzer is designed to get CCRwSs by performing static program analysis on kernel codes. Based on CCRwSs, data transfer between multiple devices and host can be fully controlled by our framework. Then a kernel code regenerator is designed to distribute the workload and perform architecture specific optimizations by code transformation. Then we tested our framework on a platform with 2 Intel E5-2650 CPUs and 4 NVIDIA Tesla C2050 GPUs. Compared with the performance on single GPU, the kernels running on all the 6 devices can achieve about 4.5x faster.
更多
查看译文
关键词
global memory access patterns,global memory access regions,static program analysis,multi-device,parallel programming,opencl programming model,abstract representation ccrws,graphics processing units,code transformation,program diagnostics,nvidia tesla c2050 gpu,heterogeneous multi-device platform,intel e5-2650 cpu,automatic,multiple devices,automatic mapping single-device opencl program,kernel codes,performance,data transfer,benchmark testing,indexes,kernel,computer architecture,optimization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要