Extraction and evaluation of formulaic expressions used in scholarly papers

Expert Systems with Applications(2022)

引用 10|浏览16
暂无评分
摘要
Formulaic expressions, such as ‘in this paper we propose’, are helpful for authors of scholarly papers because they convey communicative functions; in the above, it is ‘showing the aim of this paper’. Thus, resources of formulaic expressions, such as a dictionary, that could be looked up easily would be useful. However, forms of formulaic expressions can often vary to a great extent. For example, ‘in this paper we propose’, ‘in this study we propose’ and ‘in this paper we propose a new method to’ are all regarded as formulaic expressions. Such a diversity of spans and forms causes problems in both extraction and evaluation of formulaic expressions. In this paper, we propose a new approach that is robust to variation of spans and forms of formulaic expressions. Our approach regards a sentence as consisting of a formulaic part and non-formulaic part. Then, instead of trying to extract formulaic expressions from a whole corpus, by extracting them from each sentence, different forms can be dealt with at once. Based on this formulation, to avoid the diversity problem, we propose evaluating extraction methods by how much they convey specific communicative functions rather than by comparing extracted expressions to an existing lexicon. We also propose a new extraction method that utilises named entities and dependency structures to remove the non-formulaic part from a sentence. Experimental results show that the proposed extraction method achieved the best performance compared to other existing methods.
更多
查看译文
关键词
Natural language processing,Formulaic expressions,Multi-word expressions,Writing assistance,English for academic purposes
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要