Gkeep: An Enhanced Graph-Based Keyword Extractor With Error-Feedback Propagation For Geoscience Reports

EARTH AND SPACE SCIENCE(2021)

引用 7|浏览2
暂无评分
摘要
As the amount of published geoscience literature grows, reading and summarizing texts of large collections has become a challenging task. Publication keywords can be considered basic components of knowledge structure representations and have been used to reveal knowledge concerning research domains. In contrast to data used in other research domains, the works on textual geoscience data that entail keyword extraction are limited. In this paper, we propose an unsupervised algorithm, the graph-based keyword extractor with error-feedback propagation (GKEEP), that enhances graph-based keyword extraction approaches by using an error-feedback mechanism similar to the concept of backpropagation. The proposed approach comprises the following steps. A preprocessed document is used as the input of the proposed model and is represented as a weighted undirected graph, where the vertices represent words and the edges represent the cooccurrence relationship between the words constrained by a window size. Subsequently, its nodes are ranked by their importance scores calculated by a graph-based ranking algorithm. Consequently, all the words have their own scores, and they are used to compute the scores of keyword candidates. Subsequently, the Word2Vec method is applied to recalculate the scores of keyword candidates and rank the keyword candidates to select the final keyword. It also utilizes error feedback to boost the rankings of the most salient terms that would otherwise be deemed less important. With empirical experiments on two real data sets (including our newly built data set), the proposed GKEEP model outperforms state-of-the-art unsupervised models and the existing graph-based ranking models. The proposed method can effectively reflect intrinsic keyword semantics and interrelationships.
更多
查看译文
关键词
backpropagation, error feedback, geoscience reports, keyword extraction, TextRank, Word2Vec
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要