Investigating Recurrent Transformers with Dynamic Halt
arxiv(2024)
摘要
In this paper, we study the inductive biases of two major approaches to
augmenting Transformers with a recurrent mechanism - (1) the approach of
incorporating a depth-wise recurrence similar to Universal Transformers; and
(2) the approach of incorporating a chunk-wise temporal recurrence like
Temporal Latent Bottleneck. Furthermore, we propose and investigate novel ways
to extend and combine the above methods - for example, we propose a global
mean-based dynamic halting mechanism for Universal Transformer and an
augmentation of Temporal Latent Bottleneck with elements from Universal
Transformer. We compare the models and probe their inductive biases in several
diagnostic tasks such as Long Range Arena (LRA), flip-flop language modeling,
ListOps, and Logical Inference.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要