A Globally Convergent Algorithm for Neural Network Parameter Optimization Based on Difference-of-Convex Functions
CoRR(2024)
摘要
We propose an algorithm for optimizing the parameters of single hidden layer
neural networks. Specifically, we derive a blockwise difference-of-convex (DC)
functions representation of the objective function. Based on the latter, we
propose a block coordinate descent (BCD) approach that we combine with a
tailored difference-of-convex functions algorithm (DCA). We prove global
convergence of the proposed algorithm. Furthermore, we mathematically analyze
the convergence rate of parameters and the convergence rate in value (i.e., the
training loss). We give conditions under which our algorithm converges linearly
or even faster depending on the local shape of the loss function. We confirm
our theoretical derivations numerically and compare our algorithm against
state-of-the-art gradient-based solvers in terms of both training loss and test
loss.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要