Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking
arxiv(2023)
摘要
Recent work by Power et al. (2022) highlighted a surprising "grokking"
phenomenon in learning arithmetic tasks: a neural net first "memorizes" the
training set, resulting in perfect training accuracy but near-random test
accuracy, and after training for sufficiently longer, it suddenly transitions
to perfect test accuracy. This paper studies the grokking phenomenon in
theoretical setups and shows that it can be induced by a dichotomy of early and
late phase implicit biases. Specifically, when training homogeneous neural nets
with large initialization and small weight decay on both classification and
regression tasks, we prove that the training process gets trapped at a solution
corresponding to a kernel predictor for a long time, and then a very sharp
transition to min-norm/max-margin predictors occurs, leading to a dramatic
change in test accuracy.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要