UZNER: A Benchmark for Named Entity Recognition in Uzbek.

Aizihaierjiang Yusufu,Liu Jiang, Abidan Ainiwaer,Chong Teng, Aizierguli Yusufu,Fei Li,Donghong Ji

NLPCC (1)(2023)

引用 0|浏览19
暂无评分
摘要
Named entity recognition (NER) is a key task in natural language processing, and entity recognition can provide necessary semantic information for many downstream tasks. However, the performance of NER is often limited by the richness of language resources. For low-resource languages, NER usually performs poorly due to the lack of sufficient labeled data and pre-trained models. To address this issue, we manually constructed a large-scale, high-quality Uzbek NER corpus of Uzbek, and experimented with various NER methods. We improved state-of-the-art baseline models by introducing additional features and data translations. Data translation enables the model to learn richer syntactic structure and semantic information. Affix features provide knowledge at the morphological level and play an important role in identifying oversimplified low-frequency entity labels. Our data and models will be available to facilitate low-resource NER.
更多
查看译文
关键词
named entity recognition,uzbek,benchmark
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要