Automatic translation of MPI source into a latency-tolerant, data-driven form.
J. Parallel Distrib. Comput.(2017)
摘要
Hiding communication behind useful computation is an important performance programming technique but remains an inscrutable programming exercise even for the expert. We present Bamboo, a code transformation framework that can realize communication overlap in applications written in MPI without the need to intrusively modify the source code. We reformulate MPI source into a task dependency graph representation, which partially orders the tasks, enabling the program to execute in a data-driven fashion under the control of an external runtime system. Experimental results demonstrate that Bamboo significantly reduces communication delays while requiring only modest amounts of programmer annotation for a variety of applications and platforms, including those employing co-processors and accelerators. Moreover, Bamboos performance meets or exceeds that of labor-intensive hand coding. The translator is more than a means of hiding communication costs automatically; it demonstrates the utility of semantic level optimization against a well-known library. Bamboo is a translator that can reformulate MPI source into a task graph form.Bamboo supports both point-to-point and collective communication.Bamboo supports GPUs, hiding communication among GPUs and between hosts and GPUs.Bamboo speeds up applications containing elaborate data and control structures.
更多查看译文
关键词
Automatic communication hiding,Source-to-source translator,Task dependency graph,Data-driven execution
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要