Clustering Scientific Document Based on an Extended Citation Model

IEEE ACCESS(2019)

引用 7|浏览61
暂无评分
摘要
With the number of published scientific paper increasing exponentially, scientific document clustering is becoming a challenging task. Therefore, a scientific document clustering model with high quality is needed. In this paper, we propose an extended citation model for scientific document clustering. On the one hand, the proposed model considers that 1) the high frequency and the wide distribution of a scientific document cited in other documents will result in the high similarity between the citing and the cited documents; and 2) the close location of two scientific documents cited in a scientific document will also result in the high similarity between these two documents. On the other hand, the proposed model combines a citation networks and textual similarity network to enhance the performance of scientific document clustering. To evaluate the performance of our proposed model, we collect scientific documents from PMC and PubMed databases in the field of oncology as a case study. It is proved that our proposed model can obtain reasonably clustering results by comparing it with traditional scientific documents clustering models, such as traditional bibliographic coupling model and textual similarity model, according to the indices of precision, recall, and F1-score.
更多
查看译文
关键词
Scientific document clustering,citation frequency analysis,citation distribution analysis,citation proximity analysis,textual similarity,random walk algorithm
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要