Partitioned Reduction for Heterogeneous Environments.

Alessio De Rango, Gladys Utrera, Marisa Gil, Xavier Martorell, Andrea Giordano, Donato D'Ambrosio, Giuseppe Mendicino

International Euromicro Conference on Parallel, Distributed and Network-Based Processing(2024)

引用 0|浏览0
暂无评分
摘要
Nowadays, performance in HPC applications focuses on MPI efficiency as the de facto message-passing library to exploit parallelism. Features such as multithread and communication and processing overlap are continuously studied to adapt to new platforms and a more significant number of processing units like GPU platforms. In this sense, recently, the MPI-4.0 standard introduced the partitioned point-to-point communication primitives to potentiate computation and communication overlapping. This paper introduces an innovative extension to MPI, specifically addressing partitioned communication for MPI-reduction primitives. Traditional reduction tasks conventionally involve processing the complete input vector following the conclusion of GPU computations. In contrast, our proposed methodology exploits message partitioning to process reduction tasks in real-time incrementally. This approach allows the system to process individual partitions of the input vector as they become available, removing the necessity to await the full completion of GPU computations before initiating the reduction. Our results demonstrate promising benefits, particularly for large message sizes. However, it is essential to acknowledge that optimizations at synchronization points remain potential bottlenecks, requiring meticulous analysis and consideration.
更多
查看译文
关键词
MPI,partitioned communication,GPU programming,distributed computing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要