PipeOptim: Ensuring Effective 1F1B Schedule with Optimizer-Dependent Weight Prediction
CoRR(2023)
摘要
Asynchronous pipeline model parallelism with a "1F1B" (one forward, one
backward) schedule generates little bubble overhead and always provides quite a
high throughput. However, the "1F1B" schedule inevitably leads to weight
inconsistency and weight staleness issues due to the cross-training of
different mini-batches across GPUs. To simultaneously address these two
problems, in this paper, we propose an optimizer-dependent weight prediction
strategy (a.k.a PipeOptim) for asynchronous pipeline training. The key insight
of our proposal is that we employ a weight prediction strategy in the forward
pass to ensure that each mini-batch uses consistent and staleness-free weights
to compute the forward pass. To be concrete, we first construct the weight
prediction scheme based on the update rule of the used optimizer when training
the deep neural network models. Then throughout the "1F1B" pipelined training,
each mini-batch is mandated to execute weight prediction ahead of the forward
pass, subsequently employing the predicted weights to perform the forward pass.
As a result, PipeOptim 1) inherits the advantage of the "1F1B" schedule and
generates pretty high throughput, and 2) can ensure effective parameter
learning regardless of the type of the used optimizer. To verify the
effectiveness of our proposal, we conducted extensive experimental evaluations
using eight different deep-learning models spanning three machine-learning
tasks including image classification, sentiment analysis, and machine
translation. The experiment results demonstrate that PipeOptim outperforms the
popular pipelined approaches including GPipe, PipeDream, PipeDream-2BW, and
SpecTrain. The code of PipeOptim will be accessible at
https://github.com/guanleics/PipeOptim.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要