Language-Specific Neurons: the Key to Multilingual Capabilities in Large Language Models
Annual Meeting of the Association for Computational Linguistics(2024)
Beijing Key Laboratory of Big Data Management
Abstract
Large language models (LLMs) demonstrate remarkable multilingual capabilitieswithout being pre-trained on specially curated multilingual parallel corpora.It remains a challenging problem to explain the underlying mechanisms by whichLLMs process multilingual texts. In this paper, we delve into the compositionof Transformer architectures in LLMs to pinpoint language-specific regions.Specially, we propose a novel detection method, language activation probabilityentropy (LAPE), to identify language-specific neurons within LLMs. Based onLAPE, we conduct comprehensive experiments on two representative LLMs, namelyLLaMA-2 and BLOOM. Our findings indicate that LLMs' proficiency in processing aparticular language is predominantly due to a small subset of neurons,primarily situated in the models' top and bottom layers. Furthermore, weshowcase the feasibility to "steer" the output language of LLMs by selectivelyactivating or deactivating language-specific neurons. Our research providesimportant evidence to the understanding and exploration of the multilingualcapabilities of LLMs.
MoreTranslated text
PDF
View via Publisher
AI Read Science
Must-Reading Tree
Example

Generate MRT to find the research sequence of this paper
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper
Summary is being generated by the instructions you defined