Forward Gradient-Based Frank-Wolfe Optimization for Memory Efficient Deep Neural Network Training
CoRR(2024)
摘要
Training a deep neural network using gradient-based methods necessitates the
calculation of gradients at each level. However, using backpropagation or
reverse mode differentiation, to calculate the gradients necessities
significant memory consumption, rendering backpropagation an inefficient method
for computing gradients. This paper focuses on analyzing the performance of the
well-known Frank-Wolfe algorithm, a.k.a. conditional gradient algorithm by
having access to the forward mode of automatic differentiation to compute
gradients. We provide in-depth technical details that show the proposed
Algorithm does converge to the optimal solution with a sub-linear rate of
convergence by having access to the noisy estimate of the true gradient
obtained in the forward mode of automated differentiation, referred to as the
Projected Forward Gradient. In contrast, the standard Frank-Wolfe algorithm,
when provided with access to the Projected Forward Gradient, fails to converge
to the optimal solution. We demonstrate the convergence attributes of our
proposed algorithms using a numerical example.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要