Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs
ICLR 2024(2023)
摘要
Most interpretability research in NLP focuses on understanding the behavior
and features of a fully trained model. However, certain insights into model
behavior may only be accessible by observing the trajectory of the training
process. We present a case study of syntax acquisition in masked language
models (MLMs) that demonstrates how analyzing the evolution of interpretable
artifacts throughout training deepens our understanding of emergent behavior.
In particular, we study Syntactic Attention Structure (SAS), a naturally
emerging property of MLMs wherein specific Transformer heads tend to focus on
specific syntactic relations. We identify a brief window in pretraining when
models abruptly acquire SAS, concurrent with a steep drop in loss. This
breakthrough precipitates the subsequent acquisition of linguistic
capabilities. We then examine the causal role of SAS by manipulating SAS during
training, and demonstrate that SAS is necessary for the development of
grammatical capabilities. We further find that SAS competes with other
beneficial traits during training, and that briefly suppressing SAS improves
model quality. These findings offer an interpretation of a real-world example
of both simplicity bias and breakthrough training dynamics.
更多查看译文
关键词
interpretability,BERT,syntax,phase changes,simplicity bias,training dynamics
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要