Interpreting Grokked Transformers in Complex Modular Arithmetic
CoRR(2024)
摘要
Grokking has been actively explored to reveal the mystery of delayed
generalization. Identifying interpretable algorithms inside the grokked models
is a suggestive hint to understanding its mechanism. In this work, beyond the
simplest and well-studied modular addition, we observe the internal circuits
learned through grokking in complex modular arithmetic via interpretable
reverse engineering, which highlights the significant difference in their
dynamics: subtraction poses a strong asymmetry on Transformer; multiplication
requires cosine-biased components at all the frequencies in a Fourier domain;
polynomials often result in the superposition of the patterns from elementary
arithmetic, but clear patterns do not emerge in challenging cases; grokking can
easily occur even in higher-degree formulas with basic symmetric and
alternating expressions. We also introduce the novel progress measure for
modular arithmetic; Fourier Frequency Sparsity and Fourier Coefficient Ratio,
which not only indicate the late generalization but also characterize
distinctive internal representations of grokked models per modular operation.
Our empirical analysis emphasizes the importance of holistic evaluation among
various combinations.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要