Inverse-Free Fast Natural Gradient Descent Method for Deep Learning
arxiv(2024)
摘要
Second-order methods can converge much faster than first-order methods by
incorporating second-order derivates or statistics, but they are far less
prevalent in deep learning due to their computational inefficiency. To handle
this, many of the existing solutions focus on reducing the size of the matrix
to be inverted. However, it is still needed to perform the inverse operator in
each iteration. In this paper, we present a fast natural gradient descent
(FNGD) method, which only requires computing the inverse during the first
epoch. Firstly, we reformulate the gradient preconditioning formula in the
natural gradient descent (NGD) as a weighted sum of per-sample gradients using
the Sherman-Morrison-Woodbury formula. Building upon this, to avoid the
iterative inverse operation involved in computing coefficients, the weighted
coefficients are shared across epochs without affecting the empirical
performance.
FNGD approximates the NGD as a fixed-coefficient weighted sum, akin to the
average sum in first-order methods. Consequently, the computational complexity
of FNGD can approach that of first-order methods. To demonstrate the efficiency
of the proposed FNGD, we perform empirical evaluations on image classification
and machine translation tasks. For training ResNet-18 on the CIFAR-100 dataset,
FNGD can achieve a speedup of 2.05× compared with KFAC. For training
Transformer on Multi30K, FNGD outperforms AdamW by 24 BLEU score while
requiring almost the same training time.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要