How do Large Language Models Handle Multilingualism?
CoRR(2024)
摘要
Large language models (LLMs) demonstrate remarkable performance across a
spectrum of languages. In this work, we delve into the question: How do LLMs
handle multilingualism? We introduce a framework that depicts LLMs' processing
of multilingual inputs: In the first several layers, LLMs understand the
question, converting multilingual inputs into English to facilitate the
task-solving phase. In the intermediate layers, LLMs engage in problem-solving
by thinking in English and incorporating multilingual knowledge to obtain
factual content, leveraging the self-attention and feed-forward structures,
respectively. In the last several layers, LLMs generate responses that align
with the original language of the query. In addition, we investigate the
existence of language-specific neurons when processing a certain language. To
detect neurons activated by the input language, even without labels, we
innovatively design a Parallel Language specific Neuron Detection
() method that effectively measures the significance of neurons
when handling multilingual inputs. By comprehensive ablation analysis through
deactivating neurons of different layers and structures, we verify the
framework that we propose. Additionally, we demonstrate that we can utilize
such a framework to effectively enhance the multilingual ability with much less
training effort.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要