Large Language Models for Multilingual Slavic Named Entity Linking

Rinalds Vīksna,Inguna Skadiņa,Daiga Deksne,Roberts Rozis

Proceedings of the 9th Workshop on Slavic Natural Language Processing 2023 (SlavicNLP 2023)（2023）

Cited 0|Views0

Abstract

This paper describes our submission for the 4th Shared Task on SlavNER on three Slavic languages - Czech, Polish and Russian. We use pre-trained multilingual XLM-R Language Model (Conneau et al., 2020) and fine-tune it for three Slavic languages using datasets provided by organizers. Our multilingual NER model achieves 0.896 F-score on all corpora, with the best result for Czech (0.914) and the worst for Russian (0.880). Our cross-language entity linking module achieves F-score of 0.669 in the official SlavNER 2023 evaluation.

Translated text

Key words

Language Modeling,Multilingual Neural Machine Translation,Named Entity Recognition

Bibtex

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Data Disclaimer

The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn

Chat Paper

【要点】：本文提出了一种基于预训练的多语言XLM-R语言模型，针对捷克语、波兰语和俄语三种斯拉夫语言进行微调，实现了多语言命名实体链接任务的高效处理，并在SlavNER 2023评测中取得了优异的成绩。

【方法】：研究采用了Conneau等人于2020年提出的XLM-R语言模型，利用组织者提供的斯拉夫语言数据集进行微调，以提升模型在命名实体识别和跨语言实体链接上的性能。

【实验】：实验在SlavNER任务的数据集上进行，模型在所有语料库上达到了0.896的F-score，其中捷克语取得了最佳成绩0.914，俄语成绩最低为0.880；跨语言实体链接模块在官方SlavNER 2023评测中获得了0.669的F-score。

去 AI 文献库对话