Generalized Collective Algorithms for the Exascale Era

2023 IEEE International Conference on Cluster Computing (CLUSTER)(2023)

引用 0|浏览1
暂无评分
摘要
Exascale supercomputers have renewed the exigence of improving distributed communication, specifically MPI collectives. Previous works accelerated collectives for specific scenarios by changing the radix of the collective algorithms. However, these approaches fail to explore the interplay between modern hardware features, such as multi-port networks, and software features, such as message size. In this paper, we present a novel approach that uses system-agnostic, generalized (i.e., variable-radix) algorithms to capture relevant features and provide broad speedups for upcoming exascale-class supercomputers. We identify hardware commonalities found on announced exascale systems and three omnipresent communication kernels (binomial tree, ring, and recursive doubling) that can be generalized to better leverage these features, creating 10 total implementations. For each kernel, we develop analytical models to intuit algorithm performance with varying radix values. Experiments on the world's first exascale supercomputer (Frontier at ORNL) and a pre-exascale system (Polaris at ANL) show that our generalized algorithms outperform the baseline open-source and proprietary vendor MPI implementations by a significant margin, up to over 4.5x. We empirically determine optimal algorithms and parameter values, identifying where the analytical models are accurate and where hardware features directly determine performance. Most notably, we show how a single, system-agnostic implementation of a generalized algorithm can optimize for multiple hardware/software features across multiple systems.
更多
查看译文
关键词
Exascale computing,collective communication,MPI
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要