The stochastic Ravine accelerated gradient method with general extrapolation coefficients
arxiv(2024)
摘要
In a real Hilbert space domain setting, we study the convergence properties
of the stochastic Ravine accelerated gradient method for convex differentiable
optimization. We consider the general form of this algorithm where the
extrapolation coefficients can vary with each iteration, and where the
evaluation of the gradient is subject to random errors. This general treatment
models a breadth of practical algorithms and numerical implementations. We show
that, under a proper tuning of the extrapolation parameters, and when the error
variance associated with the gradient evaluations or the step-size sequences
vanish sufficiently fast, the Ravine method provides fast convergence of the
values both in expectation and almost surely. We also improve the convergence
rates from O(.) to o(.). Moreover, we show almost sure summability property of
the gradients, which implies the fast convergence of the gradients towards
zero. This property reflects the fact that the high-resolution ODE of the
Ravine method includes a Hessian-driven damping term. When the space is also
separable, our analysis allows also to establish almost sure weak convergence
of the sequence of iterates provided by the algorithm. We finally specialize
the analysis to consider different parameter choices, including vanishing and
constant (heavy ball method with friction) damping parameter, and present a
comprehensive landscape of the tradeoffs in speed and accuracy associated with
these parameter choices and statistical properties on the sequence of errors in
the gradient computations. We provide a thorough discussion of the similarities
and differences with the Nesterov accelerated gradient which satisfies similar
asymptotic convergence rates.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要