An extension of Synthetic Minority Oversampling Technique based on Kalman filter for imbalanced datasets

Machine Learning with Applications(2022)

引用 1|浏览28
暂无评分
摘要
More often than not, data collected in real-time tends to be imbalanced i.e., the samples belonging to a particular class are significantly more than the others. This degrades the performance of the predictor. One of the most notable algorithms to handle such an imbalance in the dataset by fabricating synthetic data, is the “Synthetic Minority Oversampling Technique (SMOTE)”. However, data imbalance is not solely responsible for the poor performance of the classifier. Certain research works have demonstrated that noisy samples can have a significant role in misclassifying the dataset. Also, handling large data is computationally expensive. Hence, data reduction is imperative. In this work, we put forth a novel extension of SMOTE by integrating it with the Kalman filter. The proposed method, Kalman-SMOTE (KSMOTE), filters out the noisy samples in the final dataset after SMOTE, which includes both the raw data and the synthetically generated samples, thereby reducing the size of the dataset. Our model is validated with a wide range of datasets. An experimental analysis of the results shows that our model outperforms the presently available techniques.
更多
查看译文
关键词
Imbalanced data,Oversampling,SMOTE,Noise filter
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要