YellowFin and the Art of Momentum Tuning
arXiv: Machine Learning, Volume abs/1706.03471, 2018.
Hyperparameter tuning is one of the most time-consuming workloads in deep learning. State-of-the-art optimizers, such as AdaGrad, RMSProp and Adam, reduce this labor by adaptively tuning an individual learning rate for each variable. Recently researchers have shown renewed interest in simpler methods like momentum SGD as they may yield be...More