Tuning Out the Noise: Benchmarking Entity Extraction for Digitized Native American Literature

Proceedings of the Association for Information Science and Technology(2023)

引用 0|浏览12
暂无评分
摘要
ABSTRACT Named Entity Recognition (NER), the automated identification and tagging of entities in text, is a popular natural language processing task, and has the power to transform restricted data into open datasets of entities for further research. This project benchmarks four NER models–Stanford NER, BookNLP, spaCy‐trf and RoBERTa–to identify the most accurate approach and generate an open‐access, gold‐standard dataset of human annotated entities. To meet a real‐world use case, we benchmark these models on a sample dataset of sentences from Native American authored literature, identifying edge cases and areas of improvement for future NER work.
更多
查看译文
关键词
benchmarking entity extraction,digitized native american literature
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要