Exploring A Multi-Resolution Gpu Programming Model For Chapel

2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2020)(2020)

引用 4|浏览12
暂无评分
摘要
There is a growing need to support accelerators, especially GPU accelerators, since they are a common source of performance improvement in HPC clusters. As for GPU programming with Chapel, typically programmers first start with writing forall loops and run these loops on CPUs as a proof-of-concept. If the resulting CPU performance is not sufficient for their needs, their next step could be to try the automatic compiler-based GPU code generation techniques [1], [2]. For portions that remain as performance bottlenecks, even after automatic compilation approaches, the next step is to consider writing GPU kernels using CUDA/HIP/OpenCL and invoking these kernels from the Chapel program using the GPUIterator [3], [4] and Chapel's C interoperability feature.
更多
查看译文
关键词
automatic compilation,GPU kernels,Chapel program,multiresolution GPU programming model,GPU accelerators,HPC clusters,forall loops,Chapel C interoperability feature,automatic compiler-based GPU code generation,CUDA,OpenCL,HIP
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要