Using boosting mechanism to refine the threshold of VSM-based similarity in text classification

Intelligent Control and Automation, 2002. Proceedings of the 4th World Congress  (2002)

引用 3|浏览2
暂无评分
摘要
The vector space model (VSM)-based similarity classifier is the simplest text categorization method. It has a high classification speed, but with low accuracy. The main reason is that the similarity threshold used by the similarity classifier is decided empirically, but not mathematically. This paper introduces a boosting-based mechanism to adaptively compute out relatively accurate similarity threshold over specific dataset. This method constructs better similarity-based classification rules by combining the similarity thresholds generated by the constituent classifiers of boosting. It greedily minimizes the error rates on training documents; therefore the similarity classifier with thus computed threshold should also have low error rates.
更多
查看译文
关键词
boosting learning,similarity,text categorization,category theory,automation,boosting,error rate,vector space model,machine learning,computer science,learning artificial intelligence,intelligent control,intelligent systems,information retrieval
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要