Parallel SGD: When does averaging help?
arXiv: Machine Learning, Volume abs/1606.07365, 2016.
Consider a number of workers running SGD independently on the same pool of data and averaging the models every once in a while -- a common but not well understood practice. We study model averaging as a variance-reducing mechanism and describe two ways in which the frequency of averaging affects convergence. For convex objectives, we show...More
Full Text (Upload PDF)