Clustering Document based on Semantic Similarity Using Graph Base Spectral Algorithm

2022 5th International Conference on Engineering Technology and its Applications (IICETA)(2022)

引用 0|浏览6
暂无评分
摘要
The Internet’s continued growth has resulted in a significant rise in the amount of electronic text documents. Grouping these materials into meaningful collections has become crucial. The old approach of document compilation based on statistical characteristics and categorization relied on syntactic rather than semantic information. This article introduces a unique approach for classifying texts based on their semantic similarity. The graph-based approach is depended an efficient technique been utilized for clustering. This is performed by extracting document summaries called synopses from the Wikipedia and IMDB databases and grouping thus downloaded documents, then utilizing the NLTK dictionary to generate them by making some important preprocessing to make it more convenient to use. Following that, a vector space is modelled using TFIDF and converted to TFIDF matrix as numeric form, and clustering is accomplished using Spectral methods. The results are compared with previews work.
更多
查看译文
关键词
Semantic Similarity,Text Clustering,Spectral Algorithm TFIDF
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要