An Improved Approach to Terms Weighting in Text Classification

Computer and Management(2011)

引用 10|浏览1
暂无评分
摘要
Most of traditional text classification methods utilize term frequency (tf) and inverse document frequency (idf) for representing importance of terms and computing weighting of ones in classifying a text document. Term weighting plays an important role to achieve high performance in text classification. Although the tf-idf model is a popular method, it is not involved class information of the terms. This paper provides an improved tf-idf-ci model to compute weighting of the terms. The intra class information and inner class information are joined. The experimental results show that the performance is enhanced. The role of important and representative terms is raised and the effect of the unimportant feature term to classification is decreased. In addition, the F1 based on tf-idf-ci algorithm is higher than based on traditional tf-idf model.
更多
查看译文
关键词
pattern classification,text analysis,inner class information,intra class information,inverse document frequency,term frequency,terms weighting,text document classification,tf-idf model,machine learning,computational modeling,classification algorithms
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要