On The Learnability Of Fully-Connected Neural Networks
ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 54(2017)
摘要
Despite the empirical success of deep neural networks, there is limited theoretical understanding of the learnability of these models with respect to polynomial-time algorithms. In this paper, we characterize the learnability of fully-connected neural networks via both positive and negative results. We focus on l(1)-regularized networks, where the l(1)-norm of the incoming weights of every neuron is assumed to be bounded by a constant B > 0. Our first result shows that such networks are properly learnable in poly(n, d, exp(1/epsilon(2))) time, where n and d are the sample size and the input dimension, and epsilon > 0 is the gap to optimality. The bound is achieved by repeatedly sampling over a low-dimensional manifold so as to ensure approximate optimality, but avoids the exp(d) cost of exhaustively searching over the parameter space. We also establish a hardness result showing that the exponential dependence on 1/epsilon is unavoidable unless RP = NP. Our second result shows that the exponential dependence on 1/epsilon can be avoided by exploiting the underlying structure of the data distribution. In particular, if the positive and negative examples can be separated with margin gamma > 0 by an unknown neural network, then the network can be learned in poly(n, d, 1/epsilon) time. The bound is achieved by an ensemble method which uses the first algorithm as a weak learner. We further show that the separability assumption can be weakened to tolerate noisy labels. Finally, we show that the exponential dependence on 1/gamma is unimprovable under a certain cryptographic assumption.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要