Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks

ICLR, 2020.

Cited by: 20|Views73
EI
Weibo:
We studied the effect of the initialization parameter values of deep linear neural networks on the convergence time of gradient descent

Abstract:

The selection of initial parameter values for gradient-based optimization of deep neural networks is one of the most impactful hyperparameter choices in deep learning systems, affecting both convergence times and model performance. Yet despite significant empirical and theoretical analysis, relatively little has been proved about the conc...More

Code:

Data:

Your rating :
0

 

Tags
Comments