Radial-based undersampling approach with adaptive undersampling ratio determination.

Neurocomputing(2023)

引用 0|浏览7
暂无评分
摘要
Nowadays, machine learning techniques are employed in a wide range of applications, where classification is a common task in machine learning. It predicts the class label of a previously unseen example according to the decision of a classification model, which is learned by running a classifier learning algorithm on the collected training examples set. On the other hand, in many practical applications, the collected training sets are usually class imbalanced, that is, one class can have significantly more examples than the other class(es), but the minority class usually carries much valuable information and is more important than the majority class. However, most classifier learning algorithms are designed under the assumption that each class in a training set has approximately the same number of examples, leading to the consequence that they often can not achieve satisfactory classification performance on imbalanced data especially for the minority class examples. To solve this problem, a Radial-Based Undersampling approach with Adaptive undersampling Ratio (RBU-AR) is proposed in this paper. The main novelty of RBU-AR is that it attempts to determine the proper undersampling ratio according to the class overlap data complexity rather than adopting the default value 1 or using the empirical trial and error strategy as many existing undersampling approaches do. Experiments are conducted on 30 benchmark imbalanced datasets and 10 artificial datasets, the obtained results and corresponding statistical tests indicate that class overlap degree indeed has a great influence on the achievable classification performance and is usually more important than the class imbalance ratio IR, and our undersampling approach RBU-AR generally achieves highly competitive or better performance with respect to several state-of-the-art approaches. Therefore, this work provides a theoretical guideline in determining the proper extent of undersampling by utilizing the class overlap data complexity information.
更多
查看译文
关键词
Machine learning,Classification,Class imbalance,Undersampling,Data complexity,Class overlap
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要