Oversampling Methods for Handling Imbalance Data in Binary Classification.

Theodorus Riston, Sandi Nurhibatulloh Suherman, Yonnatan Yonnatan,Fajar Indrayatna,Anindya Apriliyanti Pravitasari, Eka Novita Sari,Tutut Herawan

ICCSA (Workshops 2)(2023)

引用 0|浏览4
暂无评分
摘要
Data preparation occupies the majority of data science, about 60–80%. The process of data preparation can produce an accurate output of information to be used in decision making. That is why, in the context of data science, it is so critical. However, in reality, data does not always come in a predefined distribution with parameters, and it can even arrive with an imbalance. Imbalanced data generates a lot of problems, especially in classification. This study employs several oversampling methods in machine learning, i.e., Random Oversampling (ROS), Adaptive Synthetic Sampling (ADASYN), Synthetic Minority Over-sampling Technique (SMOTE), and Borderline-SMOTE (B-SMOTE), to handle imbalanced data in binary classification with Naïve Bayes and Support Vector Machine (SVM). The five methods will be run in the same experimental design and discussed in search of the best and most accurate model for the datasets. The evaluation was assessed based on the confusion matrices with precision, recall, and F1-score calculated for comparison. The AUC and ROC curve is also provided to evaluate the performance of each method via figures. The proposed work reveals that SVM with B-SMOTE has better classification performance, especially in datasets with high similarity characteristics between the minority and majority classes.
更多
查看译文
关键词
handling imbalance data,classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要