Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

ICLR, 2020.

Cited by: 39|Bibtex|Views305|Links
EI

Abstract:

Training large deep neural networks on massive datasets is  computationally very challenging. There has been recent surge in interest in using large batch stochastic optimization methods to tackle this issue. The most prominent algorithm in this line of research is LARS, which by  employing layerwise adaptive learning rates trains ResNet ...More

Code:

Data:

Your rating :
0

 

Tags
Comments