A novel approach for entity resolution in scientific documents using context graphs.

Information Sciences(2018)

引用 11|浏览55
暂无评分
摘要
Entity resolution refers to disambiguating and resolving entities in structured and unstructured data. Developments of effective resolution algorithms are significant for processing scientific documents, particularly for biomedical literature. Specifically, name ambiguity among biomedical entities is a primary task that needs to be solved in the knowledge extraction process. In this paper, we present a novel approach to disambiguating gene/protein names by using context graphs. A set of abstracts of documents is used to build the context graphs through disclosing the indirect co-occurrence relationships among words. Feature vectors of the graphs can be constructed according to information gain (IG) on the word set. To evaluate the IG values, we propose a new metrics that integrates the word frequency (WF), dispersion degree (DD) and concentration degree (CD). Finally, entity resolution is performed by applying a support vector machine (SVM). Compared to existing approaches, the proposed method is capable of discovering latent information from the context of entity names, rather than using some statistical information such as the number of occurrences of words. Based on the results from comprehensive experiments over two benchmark datasets, we conclude that our proposed method, compared to several existing solutions, for resolving ambiguity entities is promising.
更多
查看译文
关键词
Feature selection,Entity resolution,Context-based graphs,Support vector machines
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要