Piper: Pipelining OpenMP Offloading Execution Through Compiler Optimization For Performance

Konstantinos Parasyris,Giorgis Georgakoudis,Johannes Doerfert,Ignacio Laguna,Thomas R.W. Scogland

2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)（2022）

引用 0|浏览19

暂无评分

摘要

OpenMP offload improves the application development complexity of HPC GPU codes and provides portability. A source of poor performance is the lockstep execution of data transfers and computation. Overlapping these operations can provide significant performance gains. However, the developer must manually slice data transfers and kernel execution, and efficiently schedule these operations for execution, which is a hard and error-prone task.We propose Piper, an automatic mechanism for OpenMP offload to perform overlapping. Piper statically analyzes offload kernels and associates computations with memory locations. The extended runtime system exploits this analysis information, divides a kernel into independent sub-tasks, and schedules them for pipelined execution for overlapping. At any point in time, Piper also controls the coarseness and number of sub-tasks executed. By doing so, Piper allows the execution of kernels whose memory requirements exceed the GPU device memory. Piper speeds up execution up to 2.67× compared to OpenMP offload execution.

查看译文

关键词

OpenMP,GPGPU,memory optimization

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要