Time/Accuracy Tradeoffs For Learning A Relu With Respect To Gaussian Marginals

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019)(2019)

引用 56|浏览47
暂无评分
摘要
We consider the problem of computing the best-fitting ReLU with respect to square-loss on a training set when the examples have been drawn according to a spherical Gaussian distribution (the labels can be arbitrary). Let opt < 1 be the population loss of the best-fitting ReLU. We prove:Finding a ReLU with square-loss opt+. is as hard as the problem of learning sparse parities with noise, widely thought to be computationally intractable. This is the first hardness result for learning a ReLU with respect to Gaussian marginals, and our results imply -unconditionally- that gradient descent cannot converge to the global minimum in polynomial time.There exists an efficient approximation algorithm for finding the best-fitting ReLU that achieves error O(opt(2/3)). The algorithm uses a novel reduction to noisy halfspace learning with respect to 0/1 loss.Prior work due to Soltanolkotabi [Sol17] showed that gradient descent can find the best-fitting ReLU with respect to Gaussian marginals, if the training set is exactly labeled by a ReLU.
更多
查看译文
关键词
polynomial time,global minimum,computationally intractable
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要