ACCLAiM: Advancing the Practicality of MPI Collective Communication Autotuning Using Machine Learning

2022 IEEE International Conference on Cluster Computing (CLUSTER)(2022)

引用 1|浏览48
暂无评分
摘要
MPI collective communication is an omnipresent communication model for high-performance computing (HPC) systems. The performance of a collective operation depends strongly on the algorithm used to implement it. MPI libraries use inaccurate heuristics to select these algorithms, causing applications to suffer unnecessary slowdowns. Machine learning (ML)-based autotuners are a promising alternative. ML autotuners can intelligently select algorithms for individual jobs, resulting in near-optimal performance. However, these approaches currently spend more time training than they save by accelerating applications, rendering them impractical. We make the case that ML-based collective algorithm selection autotuners can be made practical and accelerate production applications on large-scale supercomputers. We identify multiple impracticalities in the existing work, such as inefficient training point selection and ignoring non-power-of-two feature values. We address these issues through variance-based point selection and model testing alongside topology-aware benchmark paral-lelization. Our approach minimizes training time by eliminating unnecessary training points and maximizing machine utilization. We incorporate our improvements in a prototype active learning system, ACCLAiM (Advancing Collective Communication (L) Autotuning using Machine Learning). We show that each of ACCLAiM's advancements significantly reduces training time compared with the best existing machine learning approach. Then we apply ACCLAiM on a leadership-class supercomputer and demonstrate the conditions where ACCLAiM can accelerate HPC applications, proving the advantage of ML autotuners in a production setting for the first time.
更多
查看译文
关键词
MPI,collective communication,autotuning machine learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要