Asynchrony begets Momentum, with an Application to Deep Learning
Allerton, pp. 997-1004, 2016.
Asynchronous methods are widely used in deep learning, but have limited theoretical justification when applied to non-convex problems. We show that running stochastic gradient descent (SGD) in an asynchronous manner can be viewed as adding a momentum-like term to the SGD iteration. Our result does not assume convexity of the objective fun...More
Full Text (Upload PDF)