Automatic translation of MPI source into a latency-tolerant, data-driven form.

J. Parallel Distrib. Comput.(2017)

引用 6|浏览26
暂无评分
摘要
Hiding communication behind useful computation is an important performance programming technique but remains an inscrutable programming exercise even for the expert. We present Bamboo, a code transformation framework that can realize communication overlap in applications written in MPI without the need to intrusively modify the source code. We reformulate MPI source into a task dependency graph representation, which partially orders the tasks, enabling the program to execute in a data-driven fashion under the control of an external runtime system. Experimental results demonstrate that Bamboo significantly reduces communication delays while requiring only modest amounts of programmer annotation for a variety of applications and platforms, including those employing co-processors and accelerators. Moreover, Bamboos performance meets or exceeds that of labor-intensive hand coding. The translator is more than a means of hiding communication costs automatically; it demonstrates the utility of semantic level optimization against a well-known library. Bamboo is a translator that can reformulate MPI source into a task graph form.Bamboo supports both point-to-point and collective communication.Bamboo supports GPUs, hiding communication among GPUs and between hosts and GPUs.Bamboo speeds up applications containing elaborate data and control structures.
更多
查看译文
关键词
Automatic communication hiding,Source-to-source translator,Task dependency graph,Data-driven execution
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要