Don't Decay the Learning Rate, Increase the Batch Size

    Samuel L. Smith
    Samuel L. Smith

    international conference on learning representations, 2018.

    Cited by: 286|Bibtex|Views57|Links
    EI

    Abstract:

    It is common practice to decay the learning rate. Here we show one can usually obtain the same learning curve on both training and test sets by instead increasing the batch size during training. This procedure is successful for stochastic gradient descent (SGD), SGD with momentum, Nesterov momentum, and Adam. It reaches equivalent test ac...More

    Code:

    Data:

    Your rating :
    0

     

    Tags
    Comments