Transforming Wikipedia into Named Entity Training Data

ALTA(2008)

引用 137|浏览33
暂无评分
摘要
Statistical named entity recognisers require costly hand-labelled training data and, as a result, most existing corpora are small. We exploit Wikipedia to create a massive corpus of named entity annotated text. We transform Wikipedia's links into named en- tity annotations by classifying the target ar- ticles into common entity types (e.g. per- son, organisation and location). Compar- ing to MUC, CONLL and BBN corpora, Wikipedia generally performs better than other cross-corpus train/test pairs.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要