Provable Acceleration of Nesterov's Accelerated Gradient Method over Heavy Ball Method in Training Over-Parameterized Neural Networks
arxiv(2022)
摘要
Due to its simplicity and efficiency, the first-order gradient method has
been extensively employed in training neural networks. Although the
optimization problem of the neural network is non-convex, recent research has
proved that the first-order method is capable of attaining a global minimum
during training over-parameterized neural networks, where the number of
parameters is significantly larger than that of training instances. Momentum
methods, including the heavy ball (HB) method and Nesterov's accelerated
gradient (NAG) method, are the workhorse of first-order gradient methods owning
to their accelerated convergence. In practice, NAG often exhibits superior
performance than HB. However, current theoretical works fail to distinguish
their convergence difference in training neural networks. To fill this gap, we
consider the training problem of the two-layer ReLU neural network under
over-parameterization and random initialization. Leveraging high-resolution
dynamical systems and neural tangent kernel (NTK) theory, our result not only
establishes tighter upper bounds of the convergence rate for both HB and NAG,
but also provides the first theoretical guarantee for the acceleration of NAG
over HB in training neural networks. Finally, we validate our theoretical
results on three benchmark datasets.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要