Shared Resources for Multilingual Information Extraction and Challenges in Named Entity Annotation

Shudong Huang,Alexis Mitchell

msra(2004)

引用 25|浏览29
暂无评分
摘要
Progress in natural language processing requires increasing amounts of data and annotation in a growing variety of languages, and research in named entity extraction is no exception. While the value of richlyannotated, large-scale multilingual corpora is undeniable, costs for producing such data are high, underscoring the value of shared resources. As part of the US Governmentsponsored Automatic Content Extraction Program (ACE), the University of Pennsylvania's Linguistic Data Consortium has recently created a number of shared resources to support technology evaluations in multilingual information extraction. This paper discusses the challenges of multilingual corpus development, with a particular focus on Chinese named entities. It concludes with a description of the corpora developed to support this research.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要