Tulip: lightweight entity recognition and disambiguation using wikipedia-based topic centroids.

IR(2014)

引用 25|浏览15
暂无评分
摘要
ABSTRACTThis article presents Tulip, an ERD system submitted to the ERD 2014: Entity Recognition and Disambiguation Challenge. The objective of the proposed system is to spot mentions of entities in a document and link the mentions to corresponding Freebase articles. To achieve it, Tulip prunes the set of entity candidates focusing on a core subset of related entities capturing the context of the document. The relationship strength is measured as a similarity to a topic centroid generated from entity features. Each entity is represented by an accurate and compact feature vector extracted from a category graph built based on information from 120 language versions of Wikipedia. Given the core set of accepted entities Tulip uses the Wikipedia-based feature vectors to extract more related entities from the document text. Tulip received the first prize in the long document track with F1 score of 0.74, which confirms the effectiveness of our system. At the same, the system was faster than all other submissions with latency under 0.29 seconds.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要