ADINE: An Adaptive Momentum Method for Stochastic Gradient Descent

CODS-COMAD '18: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data(2018)

引用 5|浏览6
暂无评分
摘要
Momentum based learning algorithms are one of the most successful learning algorithms in both convex and non-convex optimization. Two major momentum based techniques that achieved tremendous success in gradient-based optimization are Polyak's heavy ball method and Nesterov's accelerated gradient. A crucial step in all the momentum based methods is the choice of the momentum parameter m, which is always set to less than 1. Although the choice of m < 1 is justified only under very strong theoretical assumptions, it works well in practice. In this paper we propose a new momentum based method ADINE, which relaxes the constraint of m < 1 and allows the learning algorithm to use adaptive higher momentum. We motivate our relaxation on m by experimentally verifying that a higher momentum (>= 1) can help escape saddles much faster. ADINE uses this intuition and helps weigh the previous updates more, inherently setting the momentum parameter to be greater in the optimization method. To the best of our knowledge, the idea of increased momentum is first of its kind and is very novel. We evaluate this on deep neural networks and show that ADINE helps the learning algorithm to converge much faster without compromising on the generalization error.
更多
查看译文
关键词
Neural Networks,Deep Learning,Momentum,Non-Convex optimization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要