Asynchrony begets Momentum, with an Application to Deep Learning
2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton)(2016)
摘要
Asynchronous methods are widely used in deep learning, but have limited theoretical justification when applied to non-convex problems. We give a simple argument that running stochastic gradient descent (SGD) in an asynchronous manner can be viewed as adding a momentum-like term to the SGD iteration. Our result does not assume convexity of the objective function, so is applicable to deep learning systems. We observe that a standard queuing model of asynchrony results in a form of momentum that is commonly used by deep learning practitioners. This forges a link between queuing theory and asynchrony in deep learning systems, which could be useful for systems builders. For convolutional neural networks, we experimentally validate that the degree of asynchrony directly correlates with the momentum, confirming our main result. Since asynchrony has better hardware efficiency, this result may shed light on when asynchronous execution is more efficient for deep learning systems.
更多查看译文
关键词
deep learning,asynchronous methods,nonconvex problems,stochastic gradient descent,momentum-like term,SGD iteration,deep learning systems,standard queuing model,queuing theory,convolutional neural networks,negative algorithmic momentum
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络