Predictive Models with Resampling: A Comparative Study of Machine Learning Algorithms and their Performances on Handling Imbalanced Datasets

ICMLA(2019)

引用 9|浏览8
暂无评分
摘要
Class imbalance is a problem of crucial challenge in many real-world machine learning applications. Traditional machine learning algorithms are likely to produce good accuracy scores on such datasets due to an obvious bias towards the majority class. Thus, accuracy as a measure of performance for algorithms working on imbalanced data is not very clearly defined since the classifier has poor predictive accuracy over the minority class. While previous work has used several resampling techniques to aid in improving the predictive accuracy of the minority class, in this study, we explore and compare the effectiveness of the Synthetic Minority Oversampling and Random Oversampling techniques over multiple learning algorithms and resampling ratios for eight different performance measures against two datasets from diverse domains such as medicine and engineering. The results of this study show that the effectiveness of these resampling techniques is a multivariate function relative to both the learning algorithms and the resampling ratios, as well as the coherent characteristics of datasets. The choice of performance measures to evaluate models built using these resampling techniques also vary, thus giving us more relevant information useful for future research and applications.
更多
查看译文
关键词
class-imbalance,oversampling,undersampling,resampling,classification,predictive-models
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要