A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks
international conference on learning representations, 2019.
We analyze speed of convergence to global optimum for gradient descent training a deep linear neural network (parameterized as $xmapsto W_N cdots W_1x$) by minimizing the $ell_2$ loss over whitened data. Convergence at a linear rate is guaranteed when the following hold: (i) dimensions of hidden layers are at least the minimum of the inpu...More
PPT (Upload PPT)