MSCCLang: Microsoft Collective Communication Language.

ASPLOS (2)(2023)

引用 0|浏览29
暂无评分
摘要
Machine learning models with millions or billions of parameters are increasingly trained and served on large multi-GPU systems. As models grow in size and execute on more GPUs, collective communication becomes a bottleneck. Custom collective algorithms optimized for both particular network topologies and application-specific communication patterns can alleviate this bottleneck and help these applications scale. However, implementing correct and efficient custom algorithms is challenging. This paper introduces MSCCLang, a system for programmable GPU communication. MSCCLang provides a domain specific language for writing collective communication algorithms and an optimizing compiler for lowering them to an executable form, which can be executed efficiently and flexibly in an interpreter-based runtime. We used MSCCLang to write novel collective algorithms for AllReduce and AllToAll that are up to 1.9× and 1.3× faster than hand-optimized implementations, respectively.
更多
查看译文
关键词
GPU, Collective Communication, Compilers
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要