Polynomial Convergence of Gradient Descent for Training One-Hidden-Layer Neural Networks.

arXiv: Learning(2018)

引用 26|浏览27
暂无评分
摘要
We analyze Gradient Descent applied to learning a bounded target function on $n$ real-valued inputs by training a neural network with a single hidden layer of nonlinear gates. Our main finding is that GD starting from a randomly initialized network converges in mean squared loss to the minimum error (in 2-norm) of the best approximation of the target function using a polynomial of degree at most $k$. Moreover, the size of the network and number of iterations needed are both bounded by $n^{O(k)}$. The core of our analysis is the following existence theorem, which is of independent interest: for any $epsilon u003e 0$, any bounded function that has a degree-$k$ polynomial approximation with error $epsilon_0$ (in 2-norm), can be approximated to within error $epsilon_0 + epsilon$ as a linear combination of $n^{O(k)} mbox{poly}(1/epsilon)$ randomly chosen gates from any class of gates whose corresponding activation function has nonzero coefficients in its harmonic expansion for degrees up to $k$. In particular, this applies to training networks of unbiased sigmoids and ReLUs.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要