Locality-Based Optimizations in the Chapel Compiler

LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING (LCPC 2021)(2022)

引用 2|浏览0
暂无评分
摘要
One of the main challenges of distributed memory programming is achieving efficient access to data. Low-level programming paradigms such as MPI and SHMEM require programmers to explicitly move data between compute nodes, which typically results in good execution performance at the expense of programmer productivity. High-level paradigms such as the Chapel programming language aim to reduce programming difficulty by supporting a global memory view. However, implicit communication afforded by the global memory view can make it easier for the programmers to overlook performance considerations. In this paper, we show that Chapel's high-level abstractions such as data-parallel loops and distributed arrays that enable easier programming can also enable powerful compiler analyses and optimizations, which can mitigate these overheads. We demonstrate two compiler optimizations added to the Chapel compiler in versions 1.23 and 1.24. These optimizations rely on the use of data-parallel loops and distributed arrays to strength-reduce accesses to global memory and aggregate remote accesses. We test these optimizations with STREAM-Triad and index gather benchmarks and show that they result in around 2x performance improvements on a Cray XC supercomputer. Furthermore, we analyze two real-world applications, chplUltra and Arkouda, that use manual remedies to address the overheads addressed by these optimizations. We observe that more than half of the places in the source code where these remedies are applied can benefit from optimizations without any programmer effort.
更多
查看译文
关键词
Parallel programming, Compiler optimizations, Productivity
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要