Parallel SGD: When does averaging help?

    arXiv: Machine Learning, Volume abs/1606.07365, 2016.

    Cited by: 5|Bibtex|Views10|Links
    EI

    Abstract:

    Consider a number of workers running SGD independently on the same pool of data and averaging the models every once in a while -- a common but not well understood practice. We study model averaging as a variance-reducing mechanism and describe two ways in which the frequency of averaging affects convergence. For convex objectives, we show...More

    Code:

    Data:

    Your rating :
    0

     

    Tags
    Comments