Distributed Asynchronous Optimization with Unbounded Delays: How Slow Can You Go?EI
ICML, pp. 5965-5974, 2018.
One of the most widely used training methods for large-scale machine learning problems is distributed asynchronous stochastic gradient descent (DASGD). However, a key issue in its implementation is that of delays: when a worker node asynchronously contributes a gradient update to the master, the global model parameter may have changed, re...More