EBSMOTE: Evaluation-Based Synthetic Minority Oversampling TEchnique for Imbalanced Dataset Learning

ISKE(2019)

引用 1|浏览1
暂无评分
摘要
Imbalanced data pose a tremendous challenge to standard machine learning classifiers which assume balanced training data. These methods are inclined to accurately classify the majority samples, whereas the minority ones are often misclassified. Oversampling techniques are commonly used in the literature to address this problem. In a complex data environment, most of the oversampling techniques may fail to identify useful original minority instances for oversampling. The consequence is that needless synthetic samples may be generated to obstruct the class boundary and aggravate the leaning task. To handle this issue, a novel method, called Evaluation-Based Synthetic Minority Oversampling TEchnique (EBSMOTE), is offered to handle the imbalanced data problem by following five key steps: 1) synthetic instances are created based on SMOTE; 2) synthetic instances are sorted as good or bad based on the estimated rate of variance between the synthetic and the original minority and majority samples; 3) the best synthetic instances are selected based on the good ones; 4) a criterion is defined to reject the synthetic instances that are more likely to be noisy; 5) new synthetic instances are created based on the good ones using SMOTE algorithm and the best set of synthetic instances obtained are added to constitute the balanced training data. The approach is extensively evaluated on several real-world datasets at different imbalance ratio. The C4.5 and Logistic Regression (LR) classifiers are employed for experimental validation. The empirical results show that the proposed method is superior to the compared ones.
更多
查看译文
关键词
Class Imbalanced,Machine Learning,Classification,SMOTE
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要