Analysing Wikipedia and gold-standard corpora for NER training

Joel Nothman,Tara Murphy,James R. Curran

EACL（2009）

引用 74|浏览67

暂无评分

摘要

Named entity recognition (ner) for English typically involves one of three gold standards: muc, conll, or bbn, all created by costly manual annotation. Recent work has used Wikipedia to automatically create a massive corpus of named entity annotated text. We present the first comprehensive cross-corpus evaluation of ner. We identify the causes of poor cross-corpus performance and demonstrate ways of making them more compatible. Using our process, we develop a Wikipedia corpus which outperforms gold standard corpora on cross-corpus evaluation by up to 11%.

查看译文

关键词

gold standard corpus,poor cross-corpus performance,cross-corpus evaluation,analysing wikipedia,wikipedia corpus,ner training,massive corpus,gold standard,costly manual annotation,entity recognition,comprehensive cross-corpus evaluation,entity annotated text,gold-standard corpus

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要