Whale: Scaling Deep Learning Model Training to the Trillions

arXiv (Cornell University)(2020)

引用 0|浏览36
暂无评分
摘要
Scaling up deep neural networks has been proven effective in improving model quality, while it also brings ever-growing training challenges. This paper presents Whale, an automatic and hardwareaware distributed training framework for giant models. Whale generalizes the expression of parallelism with four primitives, which can define various parallel strategies, as well as flexible hybrid strategies including combination and nesting patterns. It allows users to build models at an arbitrary scale by adding a few annotations and automatically transforms the local model to a distributed implementation. Moreover, Whale is hardware-aware and highly efficient even when training on GPUs of mixed types, which meets the growing demand of heterogeneous training in industrial clusters. Whale sets a milestone for training the largest multimodal pretrained model M6. The success of M6 is achieved by Whale’s design to decouple algorithm modeling from system implementations, i.e., algorithm developers can focus on model innovation, since it takes only three lines of code to scale the M6 model to trillions of parameters on a cluster of 480 GPUs.
更多
查看译文
关键词
deep learning model training,deep learning,trillions
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要