A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks

international conference on learning representations, 2019.

Cited by: 99|Views85
EI

Abstract:

We analyze speed of convergence to global optimum for gradient descent training a deep linear neural network (parameterized as $xmapsto W_N cdots W_1x$) by minimizing the $ell_2$ loss over whitened data. Convergence at a linear rate is guaranteed when the following hold: (i) dimensions of hidden layers are at least the minimum of the inpu...More

Code:

Data:

Your rating :
0

 

Tags
Comments