Intra-Domain Text Classification: A Hybrid Approach

Soumak Chakraborty,Himadri Mukherjee, Alo Ghosh

Evolution in Computational Intelligence(2023)

引用 0|浏览0
暂无评分
摘要
The amount of textual information has been increasing at an enormous rate in the digital world. This has led to the development of efficient indexing mechanisms for easier retrieval. One of the primal attributes for categorizing texts is based on their domain. This is a challenging affair due to the commonality of vocabulary. The challenge further aggravates during deeper sub-domain classification. This very important as the rustics (especially students) often need information which concerns a particular subject. Systems capable of organizing information based on subjects can tremendously aid towards efficient retrieval in these scenarios. In this paper, a system is presented to classify educational documents amidst three subjects: computer science, physics and mathematics. Experiments were performed with over 13K research papers, and the highest accuracy of 93.35% was obtained for intra-domain classification using a hybrid technique comprising both handcrafted features and deep learning.
更多
查看译文
关键词
Subject categorization, LSTM, TF-IDF, Embedding, Hybrid technique
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要