EconBERTa: Towards Robust Extraction of Named Entities in Economics.

Karim Lasri, Pedro Vitor Quinta de Castro,Mona Schirmer, Luis Eduardo San Martin, Linxi Wang, Tomás Dulka, Haaya Naushan,John Pougué-Biyong,Arianna Legovini, Samuel Fraiberger

EMNLP 2023(2023)

引用 0|浏览20
暂无评分
摘要
Adapting general-purpose language models has proven to be effective in tackling downstream tasks within specific domains. In this paper, we address the task of extracting entities from the economics literature on impact evaluation. To this end, we release EconBERTa, a large language model pretrained on scientific publications in economics, and ECON-IE, a new expert-annotated dataset of economics abstracts for Named Entity Recognition (NER). We find that EconBERTa reaches state-of-the-art performance on our downstream NER task. Additionally, we extensively analyze the model's generalization capacities, finding that most errors correspond to detecting only a subspan of an entity or failure to extrapolate to longer sequences. This limitation is primarily due to an inability to detect part-of-speech sequences unseen during training, and this effect diminishes when the number of unique instances in the training set increases. Examining the generalization abilities of domain-specific language models paves the way towards improving the robustness of NER models for causal knowledge extraction.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要