Where does In-context Translation Happen in Large Language Models
arxiv(2024)
摘要
Self-supervised large language models have demonstrated the ability to
perform Machine Translation (MT) via in-context learning, but little is known
about where the model performs the task with respect to prompt instructions and
demonstration examples. In this work, we attempt to characterize the region
where large language models transition from in-context learners to translation
models. Through a series of layer-wise context-masking experiments on
GPTNeo2.7B, Bloom3B, Llama7b and
Llama7b-chat, we demonstrate evidence of a "task recognition" point
where the translation task is encoded into the input representations and
attention to context is no longer necessary. We further observe correspondence
between the low performance when masking out entire layers, and the task
recognition layers. Taking advantage of this redundancy results in 45%
computational savings when prompting with 5 examples, and task recognition
achieved at layer 14 / 32. Our layer-wise fine-tuning experiments indicate that
the most effective layers for MT fine-tuning are the layers critical to task
recognition.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要