Language-Specific Neurons: the Key to Multilingual Capabilities in Large Language Models

Tianyi Tang,Wenyang Luo,Haoyang Huang,Dongdong Zhang,Xiaolei Wang,Xin Zhao,Furu Wei,Ji-Rong Wen

Annual Meeting of the Association for Computational Linguistics（2024）

Beijing Key Laboratory of Big Data Management

Cited 58|Views67

Abstract

Large language models (LLMs) demonstrate remarkable multilingual capabilitieswithout being pre-trained on specially curated multilingual parallel corpora.It remains a challenging problem to explain the underlying mechanisms by whichLLMs process multilingual texts. In this paper, we delve into the compositionof Transformer architectures in LLMs to pinpoint language-specific regions.Specially, we propose a novel detection method, language activation probabilityentropy (LAPE), to identify language-specific neurons within LLMs. Based onLAPE, we conduct comprehensive experiments on two representative LLMs, namelyLLaMA-2 and BLOOM. Our findings indicate that LLMs' proficiency in processing aparticular language is predominantly due to a small subset of neurons,primarily situated in the models' top and bottom layers. Furthermore, weshowcase the feasibility to "steer" the output language of LLMs by selectivelyactivating or deactivating language-specific neurons. Our research providesimportant evidence to the understanding and exploration of the multilingualcapabilities of LLMs.

Translated text

Bibtex

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Data Disclaimer

The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn

Chat Paper

Summary is being generated by the instructions you defined