Learning Entity Types from Query Logs via Graph-Based Modeling.
CIKM'15: 24th ACM International Conference on Information and Knowledge Management Melbourne Australia October, 2015(2015)
摘要
Entities (e.g., person, movie or place) play an important role in real-world applications and learning entity types has attracted much attention in recent years. Most conventional automatic techniques use large corpora, such as news articles, to learn types of entities. However, such text corpora focus on general knowledge about entities in an objective way. Hence, it is difficult to satisfy those users with specific and personalized needs for an entity. Recent years have witnessed an explosive expansion in the mining of search query logs, which contain billions of entities. The word patterns and click-throughs in search logs are not found in text corpora, thus providing a complemental source for discovering entity types based on user behaviors. In this paper, we study the problem of learning entity types from search query logs and address the following challenges: (1) queries are short texts, and information related to entities is usually very sparse; (2) large amounts of irrelevant information exists in search logs, bringing noise in detecting entity types. In this paper, we first model query logs using a bipartite graph with entities and their auxiliary information, such as contextual words and clicked URLs. Then we propose a graph-based framework called ELP (Ensemble framework based on Lable Propagation) to simultaneously learn the types of both entities and auxiliary signals. In ELP, two separate strategies are designed to fix the problems of sparsity and noise in query logs. Extensive empirical studies are conducted on real search logs to evaluate the effectiveness of the proposed ELP framework.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络