Terapipe: Token-Level Pipeline Parallelism For Training Large-Scale Language Models

Zhuohan Li,Siyuan Zhuang,Shiyuan Guo,Danyang Zhuo,Hao Zhang,Dawn Song,Ion Stoica

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139（2021）

引用 70|浏览479

暂无评分

摘要

Model parallelism has become a necessity for training modern large-scale deep language models. In this work, we identify a new and or thogonal dimension from existing model parallel approaches: it is possible to perform pipeline parallelism within a single training sequence for Transformer-based language models thanks to its autoregressive property. This enables a more fine-grained pipeline compared with previous work. With this key idea, we design TeraPipe, a high-performance token-level pipeline parallel algorithm for synchronous model-parallel training of Transformer-based language models. We develop a novel dynamic programming-based algorithm to calculate the optimal pipelining execution scheme given a specific model and cluster configuration. We show that TeraPipe can speed up the training by 5.0x for the largest GPT-3 model with 175 billion parameters on an AWS cluster with 48 p3.16xlarge instances compared with state-of-the-art model-parallel methods. The code for reproduction can be found at https: //github.com/zhuohan123/terapipe

查看译文

关键词

pipeline,language,models,token-level,large-scale

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要