Contractive error feedback for gradient compression
CoRR(2023)
摘要
On-device memory concerns in distributed deep learning have become severe due
to (i) the growth of model size in multi-GPU training, and (ii) the wide
adoption of deep neural networks for federated learning on IoT devices which
have limited storage. In such settings, communication efficient optimization
methods are attractive alternatives, however they still struggle with memory
issues. To tackle these challenges, we propose an communication efficient
method called contractive error feedback (ConEF). As opposed to SGD with
error-feedback (EFSGD) that inefficiently manages memory, ConEF obtains the
sweet spot of convergence and memory usage, and achieves communication
efficiency by leveraging biased and all-reducable gradient compression. We
empirically validate ConEF on various learning tasks that include image
classification, language modeling, and machine translation and observe that
ConEF saves 80\% - 90\% of the extra memory in EFSGD with almost no loss on
test performance, while also achieving 1.3x - 5x speedup of SGD. Through our
work, we also demonstrate the feasibility and convergence of ConEF to clear up
the theoretical barrier of integrating ConEF to popular memory efficient
frameworks such as ZeRO-3.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要