Improved Quantization Strategies for Managing Heavy-tailed Gradients in Distributed Learning
CoRR(2024)
摘要
Gradient compression has surfaced as a key technique to address the challenge
of communication efficiency in distributed learning. In distributed deep
learning, however, it is observed that gradient distributions are heavy-tailed,
with outliers significantly influencing the design of compression strategies.
Existing parameter quantization methods experience performance degradation when
this heavy-tailed feature is ignored. In this paper, we introduce a novel
compression scheme specifically engineered for heavy-tailed gradients, which
effectively combines gradient truncation with quantization. This scheme is
adeptly implemented within a communication-limited distributed Stochastic
Gradient Descent (SGD) framework. We consider a general family of heavy-tail
gradients that follow a power-law distribution, we aim to minimize the error
resulting from quantization, thereby determining optimal values for two
critical parameters: the truncation threshold and the quantization density. We
provide a theoretical analysis on the convergence error bound under both
uniform and non-uniform quantization scenarios. Comparative experiments with
other benchmarks demonstrate the effectiveness of our proposed method in
managing the heavy-tailed gradients in a distributed learning environment.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要