Keyphrase Extraction for Technical Language Processing

JOURNAL OF RESEARCH OF THE NATIONAL INSTITUTE OF STANDARDS AND TECHNOLOGY(2022)

引用 0|浏览16
暂无评分
摘要
Keyphrase extraction is an important facet of annotation tools that offer the provision of the metadata necessary for technical language processing (TLP). Because TLP imposes additional requirements on typical natural language processing (NLP) methods, we examined TLP keyphrase extraction through the lens of a hypothetical toolkit which consists of a combination of text features and classifiers suitable for use in low-resource TLP applications. We compared two approaches for keyphrase extraction: The first which applied our toolkit-based methods that used only distributional features of words and phrases, and the second was the Maui automatic topic indexer, a well-known academic method. Performance was measured against two collections of technical literature: 1153 articles from Journal of Chemical Thermodynamics (JCT) curated by the National Institute of Standards and Technology Thermodynamics Research Center (TRC) and 244 articles from Task 5 of the Workshop on Semantic Evaluation (SemEval). Both collections have author-provided keyphrases available; the SemEval articles also have reader-provided keyphrases. Our findings indicate that our toolkit approach was competitive with Maui when author-provided keyphrases were first removed from the text. For the TRC-JCT articles, the Maui automatic topic indexer reported an F-measure of 29.4 % while our toolkit approach obtained an F-measure of 28.2 %. For the SemEval articles, our toolkit approach using a Naive Bayes classifier resulted in an F-measure of 20.8 %, which outperformed Maui's F-measure of 18.8 %.
更多
查看译文
关键词
&nbsp, keyphrase extraction, technical articles, technical language processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要