Turkish Keyphrase Extraction from Web Pages with BERT.

SIU(2021)

引用 0|浏览1
暂无评分
摘要
Keyword extraction is a natural language processing task that enables the extraction of essential and descriptive words in the text. The task of extracting keywords from well-structured texts has been studied extensively in the literature. However, because of the difference in the structures of the gathered data from the websites and the difficulty of this process, there is a lack of studies in this field. In this study, data from the websites of 25 large Turkish companies operating in Turkey is collected, and keywords related to these companies are extracted. In the proposed deep learning-based model, Sentence-BERT, a BERT-based method that has recently yielded quite successful results in natural language processing, is used. To evaluate the performance of the proposed method, the data is annotated with human effort and the impressive results are shared with the reader. In addition, clues about the companies' business domains are tried to be detected by clustering the companies' keywords. All results and findings of this study are shared at https://turkishkpe.github.io.
更多
查看译文
关键词
keyphrase extraction,natural language processing,deep learning,BERT
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要