Feature selection for high dimensional imbalanced class data based on F-measure optimization

2017 International Conference on Security, Pattern Analysis, and Cybernetics, SPAC 2017(2018)

引用 16|浏览24
暂无评分
摘要
Feature selection is designed to eliminate redundant attributes and improve classification accuracy. This is a challenging problem, especially in the case of imbalanced data. The traditional feature selection methods ignores the problem of class imbalance, making the selected features biased towards the majority class and neglecting the significant features for the minority class. Due to the advantage of F-measure in imbalanced data classification, we propose to use F-measure rather than accuracy as the optimization target in feature selection algorithm. This paper introduces a novel feature selection method SSVM-FS which is based on an optimal F-measure structural support vector machine classifier. Features will be selected according to the weight vector of SSVM which takes class imbalance problem into account. Based on this, we developed a comprehensive feature ranking method which integrate weight vector of SSVM and symmetric uncertainty. We use the comprehensive score to reduce the features to a suitable size and then use a harmony search to find the optimal combination of features to predict the target class label. The feature subset selected by the proposed method can represent both majority and minority class, in addition, it is less redundant. The experimental results on six high dimensional class imbalanced microarray data sets show that this method is a better method to solve the unbalanced classification. © 2017 IEEE.
更多
查看译文
关键词
feature selection,class-imbalanced data,F-measure,structral SVM,Harmony search
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要