Three Factors Influencing Minima in SGDEI

    Cited by: 21|Bibtex|50|

    arXiv: Learning, Volume abs/1711.046232018,

    Abstract:

    We study the statistical properties of the endpoint of stochastic gradient descent (SGD). We approximate SGD as a stochastic differential equation (SDE) and consider its Boltzmann Gibbs equilibrium distribution under the assumption of isotropic variance in loss gradients.. Through this analysis, we find that three factors – learning rate,...More
    Your rating :
    0

     

    Tags
    Comments