Spurious local minima in neural networks: a critical view

arXiv: Learning(2018)

引用 23|浏览32
暂无评分
摘要
We investigate the loss surface of nonlinear neural networks. We prove that even for networks with one hidden layer and slightest nonlinearity, there can be spurious local minima. Our results thus indicate that in general no spurious local minima is a property limited to deep linear networks. Specifically, for ReLU(-like) networks we prove that for almost all (in contrast to previous results) practical datasets there exist infinitely many local minima. We also present a counterexample for more general activation functions (such as sigmoid, tanh, arctan, ReLU, etc.), for which there exists a local minimum strictly inferior to the global minimum. Our results make the least restrictive assumptions relative to the existing results on local optimality in neural networks. We complete our discussion by presenting a comprehensive characterization of global optimality for deep linear networks. Our results unify and subsume other results on this topic.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要