3D Parallelism for Transformers via Integer Programming

Hao Zheng,Peng Liang,Yu Tang, Yanqi Shi,Linbo Qiao,Dongsheng Li

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)（2024）

引用 0|浏览0

暂无评分

摘要

Transformer models, such as BERT, GPT, and ViT, have been applied to a wide range of areas in recent years, due to their efficacy. In order to improve the training efficiency of Transformer models, different distributed training approaches have been proposed, like Megatron-LM [8]. However, when multi-dimensional parallelism strategies are considered, due to the complexity, existing works can not harmonize the different strategies well enough to obtain a globally optimal solution. In this paper, we propose a parallelism strategy searching algorithm PTIP, which generates operator-level parallelism strategies consisting of three schemes: data parallelism, tensor parallelism, and pipeline parallelism. PTIP abstracts these three parallelism schemes simultaneously into an auxiliary graph, reformulates the searching problem into a mixed-integer programming (MIP) problem, and uses a MIP solver to obtain a high-quality multi-dimensional strategy. Experiments conducted on Transformers demonstrate that PTIP obtains 13.9% − 24.7% performance improvement compared to Megatron-LM [8].

查看译文

关键词

Transformers,Auto-parallelism

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要