Acceleration of stochastic gradient descent with momentum by averaging: finite-sample rates and asymptotic normality
CoRR(2023)
摘要
Stochastic gradient descent with momentum (SGDM) has been widely used in many
machine learning and statistical applications. Despite the observed empirical
benefits of SGDM over traditional SGD, the theoretical understanding of the
role of momentum for different learning rates in the optimization process
remains widely open. We analyze the finite-sample convergence rate of SGDM
under the strongly convex settings and show that, with a large batch size, the
mini-batch SGDM converges faster than the mini-batch SGD to a neighborhood of
the optimal value. Additionally, our findings, supported by theoretical
analysis and numerical experiments, indicate that SGDM permits broader choices
of learning rates. Furthermore, we analyze the Polyak-averaging version of the
SGDM estimator, establish its asymptotic normality, and justify its asymptotic
equivalence to the averaged SGD. The asymptotic distribution of the averaged
SGDM enables uncertainty quantification of the algorithm output and statistical
inference of the model parameters.
更多查看译文
关键词
stochastic gradient descent,momentum,asymptotic normality,acceleration,finite-sample
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要