On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step LengthEI

    Cited by: 6|Bibtex|49|

    ICLR, 2019.


    The training of deep neural networks with Stochastic Gradient Descent (SGD) with a large learning rate or a small batch-size typically ends in flat regions of the weight space, as indicated by small eigenvalues of the Hessian of the training loss. This was found to correlate with a good final generalization performance. In this paper we e...More
    Your rating :