Analysing Wikipedia and gold-standard corpora for NER training

EACL(2009)

引用 74|浏览67
暂无评分
摘要
Named entity recognition (ner) for English typically involves one of three gold standards: muc, conll, or bbn, all created by costly manual annotation. Recent work has used Wikipedia to automatically create a massive corpus of named entity annotated text. We present the first comprehensive cross-corpus evaluation of ner. We identify the causes of poor cross-corpus performance and demonstrate ways of making them more compatible. Using our process, we develop a Wikipedia corpus which outperforms gold standard corpora on cross-corpus evaluation by up to 11%.
更多
查看译文
关键词
gold standard corpus,poor cross-corpus performance,cross-corpus evaluation,analysing wikipedia,wikipedia corpus,ner training,massive corpus,gold standard,costly manual annotation,entity recognition,comprehensive cross-corpus evaluation,entity annotated text,gold-standard corpus
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要