AMPeD: An Analytical Model for Performance in Distributed Training of Transformers.

Diksha Moolchandani,Joyjit Kundu,Frederik Ruelens,Peter Vrancx,Timon Evenblij,Manu Perumkunnil

ISPASS（2023）

引用 0|浏览2

暂无评分

摘要

Transformers are a class of machine learning models that have piqued high interest recently due to a multitude of reasons. They can process multiple modalities efficiently and have excellent scalability. Despite these obvious advantages, training these large models is very time-consuming. Hence, there have been efforts to speed up the training process using efficient distributed implementations. Many different types of parallelism have been identified that can be employed standalone or in combination. However, naively combining different parallelization schemes can incur significant communication overheads, thereby potentially defeating the purpose of distributed training. Thus, it becomes vital to predict the right mapping of different parallelisms to the underlying system architecture. In this work, we propose AMPeD, an analytical model for performance in distributed training of transformers. It exposes all the transformer model parameters, potential parallelism choices (along with their mapping onto the system), the accelerator as well as system architecture specifications as tunable knobs, thereby enabling hardware-software co-design. With the help of 3 case studies, we show that the combinations of parallelisms predicted to be efficient by AMPeD conform with the results from the state-of-the-art literature. Using AMPeD, we also show that future distributed systems consisting of optical communication substrates can train large models up to 4x faster as compared to the current state-of-the-art systems without modifying the peak computational power of the accelerators. Finally, we validate AMPeD with in-house experiments on real systems and via published literature. The max. observed error is limited to 12%. The model is available here: https://github.com/CSA-infra/AMPeD

查看译文

关键词

Analytical Modeling,Transformers,Distributed Training,performance

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要