Named entity recognition in Wikipedia

PWNLP@IJCNLP(2009)

引用 140|浏览61
暂无评分
摘要
Named entity recognition (NER) is used in many domains beyond the newswire text that comprises current gold-standard corpora. Recent work has used Wikipedia's link structure to automatically generate near gold-standard annotations. Until now, these resources have only been evaluated on newswire corpora or themselves. We present the first NER evaluation on a Wikipedia gold standard (WG) corpus. Our analysis of cross-corpus performance on WG shows that Wikipedia text may be a harder NER domain than newswire. We find that an automatic annotation of Wikipedia has high agreement with WG and, when used as training data, outperforms newswire models by up to 7.7%.
更多
查看译文
关键词
current gold-standard corpus,newswire text,wikipedia gold standard,entity recognition,newswire model,wikipedia text,automatic annotation,gold-standard annotation,ner domain,newswire corpus,ner evaluation,gold standard
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要