MemoryPrompt: A Light Wrapper to Improve Context Tracking in Pre-trained Language Models
CoRR(2024)
摘要
Transformer-based language models (LMs) track contextual information through
large, hard-coded input windows. We introduce MemoryPrompt, a leaner approach
in which the LM is complemented by a small auxiliary recurrent network that
passes information to the LM by prefixing its regular input with a sequence of
vectors, akin to soft prompts, without requiring LM finetuning. Tested on a
task designed to probe a LM's ability to keep track of multiple fact updates, a
MemoryPrompt-augmented LM outperforms much larger LMs that have access to the
full input history. We also test MemoryPrompt on a long-distance dialogue
dataset, where its performance is comparable to that of a model conditioned on
the entire conversation history. In both experiments we also observe that,
unlike full-finetuning approaches, MemoryPrompt does not suffer from
catastrophic forgetting when adapted to new tasks, thus not disrupting the
generalist capabilities of the underlying LM.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要