A Correlation-Based Feature Weighting Filter for Naive Bayes

IEEE Trans. Knowl. Data Eng.(2019)

引用 217|浏览69
暂无评分
摘要
Due to its simplicity, efficiency, and efficacy, naive Bayes (NB) has continued to be one of the top 10 algorithms in the data mining and machine learning community. Of numerous approaches to alleviating its conditional independence assumption, feature weighting has placed more emphasis on highly predictive features than those that are less predictive. In this paper, we argue that for NB highly predictive features should be highly correlated with the class (maximum mutual relevance), yet uncorrelated with other features (minimum mutual redundancy). Based on this premise, we propose a correlation-based feature weighting (CFW) filter for NB. In CFW, the weight for a feature is a sigmoid transformation of the difference between the feature-class correlation (mutual relevance) and the average feature-feature intercorrelation (average mutual redundancy). Experimental results show that NB with CFW significantly outperforms NB and all the other existing state-of-the-art feature weighting filters used to compare. Compared to feature weighting wrappers for improving NB, the main advantages of CFW are its low computational complexity (no search involved) and the fact that it maintains the simplicity of the final model. Besides, we apply CFW to text classification and have achieved remarkable improvements.
更多
查看译文
关键词
Redundancy,Feature extraction,Decision trees,Correlation,Mathematical model,Electronic mail,Training
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要