Width Provably Matters in Optimization for Deep Linear Neural Networks

International Conference on Machine Learning, pp. 1655-1664, 2019.

Cited by: 19|Views19
EI

Abstract:

We prove that for an $L$-layer fully-connected linear neural network, if the width of every hidden layer is $tildeOmega (L cdot r cdot d_{mathrm{out}} cdot kappa^3 )$, where $r$ and $kappa$ are the rank and the condition number of the input data, and $d_{mathrm{out}}$ is the output dimension, then gradient descent with Gaussian random ini...More

Code:

Data:

Your rating :
0

 

Tags
Comments