Towards hybrid over- and under-sampling combination methods for class imbalanced datasets: an experimental study

Artificial Intelligence Review(2022)

引用 6|浏览5
暂无评分
摘要
The skewed class distributions of many class imbalanced domain datasets often make it difficult for machine learning techniques to construct effective models. In such cases, data re-sampling techniques, such as under-sampling the majority class and over-sampling the minority class are usually employed. In related literatures, some studies have shown that hybrid combinations of under- and over-sampling methods with differ orders can produce better results. However, each study only compares with either under- or over-sampling methods to make the final conclusion. Therefore, the research objective of this paper is to find out which order of combining under- and over-sampling methods perform better. Experiments are conducted based on 44 different domain datasets using three over-sampling algorithms, including SMOTE, CTGAN, and TAN, and three under-sampling (i.e. instance selection) algorithms, including IB3, DROP3, and GA. The results show that if the under-sampling algorithm is chosen carefully, i.e. IB3, no significant performance improvement is obtained by further addition of the over-sampling step. Furthermore, with the IB3 algorithm, it is better to perform instance selection first and over-sampling second than the other combination order, which can allow the random forest classifier to provide the highest AUC rate.
更多
查看译文
关键词
Data science, Machine learning, Class imbalance, Under-sampling, Over-sampling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要