How Uniform Random Weights Induce Non-uniform Bias: Typical Interpolating Neural Networks Generalize with Narrow Teachers
CoRR(2024)
摘要
Background. A main theoretical puzzle is why over-parameterized Neural
Networks (NNs) generalize well when trained to zero loss (i.e., so they
interpolate the data). Usually, the NN is trained with Stochastic Gradient
Descent (SGD) or one of its variants. However, recent empirical work examined
the generalization of a random NN that interpolates the data: the NN was
sampled from a seemingly uniform prior over the parameters, conditioned on that
the NN perfectly classifying the training set. Interestingly, such a NN sample
typically generalized as well as SGD-trained NNs.
Contributions. We prove that such a random NN interpolator typically
generalizes well if there exists an underlying narrow “teacher NN" that agrees
with the labels. Specifically, we show that such a `flat' prior over the NN
parametrization induces a rich prior over the NN functions, due to the
redundancy in the NN structure. In particular, this creates a bias towards
simpler functions, which require less relevant parameters to represent –
enabling learning with a sample complexity approximately proportional to the
complexity of the teacher (roughly, the number of non-redundant parameters),
rather than the student's.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要