Convex Relaxations of ReLU Neural Networks Approximate Global Optima in Polynomial Time
CoRR(2024)
摘要
In this paper, we study the optimality gap between two-layer ReLU networks
regularized with weight decay and their convex relaxations. We show that when
the training data is random, the relative optimality gap between the original
problem and its relaxation can be bounded by a factor of O(√(log n)),
where n is the number of training samples. A simple application leads to a
tractable polynomial-time algorithm that is guaranteed to solve the original
non-convex problem up to a logarithmic factor. Moreover, under mild
assumptions, we show that with random initialization on the parameters local
gradient methods almost surely converge to a point that has low training loss.
Our result is an exponential improvement compared to existing results and sheds
new light on understanding why local gradient methods work well.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要