A Stochastic Quasi-Newton Method for Non-convex Optimization with Non-uniform Smoothness
arxiv(2024)
摘要
Classical convergence analyses for optimization algorithms rely on the
widely-adopted uniform smoothness assumption. However, recent experimental
studies have demonstrated that many machine learning problems exhibit
non-uniform smoothness, meaning the smoothness factor is a function of the
model parameter instead of a universal constant. In particular, it has been
observed that the smoothness grows with respect to the gradient norm along the
training trajectory. Motivated by this phenomenon, the recently introduced
(L_0, L_1)-smoothness is a more general notion, compared to traditional
L-smoothness, that captures such positive relationship between smoothness and
gradient norm. Under this type of non-uniform smoothness, existing literature
has designed stochastic first-order algorithms by utilizing gradient clipping
techniques to obtain the optimal 𝒪(ϵ^-3) sample complexity
for finding an ϵ-approximate first-order stationary solution.
Nevertheless, the studies of quasi-Newton methods are still lacking.
Considering higher accuracy and more robustness for quasi-Newton methods, in
this paper we propose a fast stochastic quasi-Newton method when there exists
non-uniformity in smoothness. Leveraging gradient clipping and variance
reduction, our algorithm can achieve the best-known
𝒪(ϵ^-3) sample complexity and enjoys convergence speedup
with simple hyperparameter tuning. Our numerical experiments show that our
proposed algorithm outperforms the state-of-the-art approaches.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要