Text categorization using Rocchio algorithm and random forest algorithm

S. Thamarai Selvi,P. Karthikeyan, A. Vincent, V. Abinaya, G. Neeraja,R. Deepika

2016 Eighth International Conference on Advanced Computing (ICoAC)(2017)

引用 27|浏览1
暂无评分
摘要
Millions of file uploads and downloads happen every minute resulting in big data creation and manual text categorization is not possible. Hence, there is a need for automatic categorization of documents that makes storage and retrieval more efficient. This research paper proposes a hybrid text categorization model that combines both Rocchio algorithm and Random Forest algorithm to perform Multi-label text categorization. Stop word remover and word stemmer has been used to overcome the limitations in Rocchio Algorithm. Random Forest model takes minimal categories as input to reduce its error rate. Experiments were done on standard text categorization datasets. Our proposed model is found to be more efficient in categorizing the documents when compared with other text categorization models such as fuzzy relevance clustering, ML-KNN (Multi-label KNN) and Naïve-Bayes Algorithms.
更多
查看译文
关键词
text categorization,information retrieval,decision trees,vector space model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要