Classifying articles in English and German Wikipedia

ALTA(2013)

引用 24|浏览53
暂无评分
摘要
Named Entity (NE) information is criti- cal for Information Extraction (IE) tasks. However, the cost of manually annotating sufficient data for training purposes, espe- cially for multiple languages, is prohibitive, meaning automated methods for develop- ing resources are crucial. We investigate the automatic generation of NE annotated data in German from Wikipedia. By incor- porating structural features of Wikipedia, we can develop a German corpus which accurately classifies Wikipedia articles into NE categories to within 1% F-score of the state-of-the-art process in English.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要