Aggressive and effective feature selection using genetic programming.

IEEE Congress on Evolutionary Computation(2012)

引用 17|浏览48
暂无评分
摘要
One of the major challenges in automatic classification is to deal with highly dimensional data. Several dimensionality reduction strategies, including popular feature selection metrics such as Information Gain and chi(2), have already been proposed to deal with this situation. However, these strategies are not well suited when the data is very skewed, a common situation in real-world data sets. This occurs when the number of samples in one class is much larger than the others, causing common feature selection metrics to be biased towards the features observed in the largest class. In this paper, we propose the use of Genetic Programming (GP) to implement an aggressive, yet very effective, selection of attributes. Our GP-based strategy is able to largely reduce dimensionality, while dealing effectively with skewed data. To this end, we exploit some of the most common feature selection metrics and, with GP, combine their results into new sets of features, obtaining a better unbiased estimate for the discriminative power of each feature. Our proposal was evaluated against each individual feature selection metric used in our GP-based solution (namely, Information Gain, chi(2), Odds-Ratio, Correlation Coefficient) using a k8 cancer-rescue mutants data set, a very unbalanced collection referring to examples of p 5 3 protein. For this data set, our solution not only increases the efficiency of the learning algorithms, with an aggressive reduction of the input space, but also significantly increases its accuracy.
更多
查看译文
关键词
biology computing,cancer,genetic algorithms,learning (artificial intelligence),pattern classification,proteins,χ2,GP-based solution,GP-based strategy,automatic classification,correlation coefficient,dimensionality reduction strategies,feature selection metrics,genetic programming,highly dimensional data,information gain,k8 cancer-rescue mutants data set,learning algorithms,odds-ratio,p53 protein
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要