Spectral Preconditioning for Gradient Methods on Graded Non-convex Functions
arxiv(2024)
摘要
The performance of optimization methods is often tied to the spectrum of the
objective Hessian. Yet, conventional assumptions, such as smoothness, do often
not enable us to make finely-grained convergence statements – particularly not
for non-convex problems. Striving for a more intricate characterization of
complexity, we introduce a unique concept termed graded non-convexity. This
allows to partition the class of non-convex problems into a nested chain of
subclasses. Interestingly, many traditional non-convex objectives, including
partially convex problems, matrix factorizations, and neural networks, fall
within these subclasses. As a second contribution, we propose gradient methods
with spectral preconditioning, which employ inexact top eigenvectors of the
Hessian to address the ill-conditioning of the problem, contingent on the
grade. Our analysis reveals that these new methods provide provably superior
convergence rates compared to basic gradient descent on applicable problem
classes, particularly when large gaps exist between the top eigenvalues of the
Hessian. Our theory is validated by numerical experiments executed on multiple
practical machine learning problems.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要