Evolutionary Undersampling For Imbalanced Big Data Classification

I. Triguero,M. Galar,S. Vluymans,C. Cornelis,H. Bustince,F. Herrera, Y. Saeys

2015 IEEE Congress on Evolutionary Computation (CEC)（2015）

引用 41|浏览15

暂无评分

摘要

Classification techniques in the big data scenario are in high demand in a wide variety of applications. The huge increment of available data may limit the applicability of most of the standard techniques. This problem becomes even more difficult when the class distribution is skewed, the topic known as imbalanced big data classification. Evolutionary undersampling techniques have shown to be a very promising solution to deal with the class imbalance problem. However, their practical application is limited to problems with no more than tens of thousands of instances.In this contribution we design a parallel model to enable evolutionary undersampling methods to deal with large-scale problems. To do this, we rely on a MapReduce scheme that distributes the functioning of these kinds of algorithms in a cluster of computing elements. Moreover, we develop a windowing approach for class imbalance data in order to speed up the undersampling process without losing accuracy. In our experiments we test the capabilities of the proposed scheme with several data sets with up to 4 million instances. The results show promising scalability abilities for evolutionary undersampling within the proposed framework.

查看译文

关键词

imbalanced Big Data classification,evolutionary undersampling methods,MapReduce,windowing approach,genetic algorithm

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要