Systematic Review for Selecting Methods of Document Clustering on Semantic Similarity of Online Laboratories

Saad Hikmat Haji,Karwan Jacksi, Razwan Mohmed Salah

International Conference on Innovations in Computing Research (ICR)(2022)

引用 0|浏览0
暂无评分
摘要
In the era of digitalization, the number of electronic text documents has been rapidly increasing on the Internet. Organizing these documents into meaningful clusters is becoming a necessity by using several methods (i.e., TF-IDF, Word Embedding) and based on documents clustering. Document clustering is the process of dynamically arranging documents into clusters such that the documents contained within a cluster are very similar to those contained inside other clusters. Due to the fact that traditional clustering algorithms do not take semantic relationships between words into account and therefore do not accurately represent the meaning of documents. Semantic information has been widely used to improve the quality of document clusters by grouping documents according to their meaning rather than their keywords. In this paper, twenty-five papers have been systematically reviewed that are published in the last seven years (from 2016 to 2022) linked to semantic similarities which are based on document clustering. Algorithms, similarity measures, tools, and evaluation methods usage have been discussed as well. As result, the survey shows that researchers used different datasets for applying semantic similarity-based clustering regarding the text similarity. Hereby, this paper proposes methods of semantic similarity approach-based clustering that can be used for short text semantic similarity included in online laboratories repository.
更多
查看译文
关键词
Document clustering, Semantic document clustering, Semantic similarity, TF-IDF, Word embedding, Online laboratories
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要