A Bayesian Perspective on Generalization and Stochastic Gradient Descent
international conference on learning representations, 2018.
We consider two related questions at the heart of machine learning; how can we predict if a minimum will generalize to the test set, and why does stochastic gradient descent find minima that generalize well? Our work responds to Zhang et al. (2016), who showed deep neural networks can easily memorize randomly labeled training data, despit...More
Full Text (Upload PDF)