A hybrid under-sampling approach for mining unbalanced datasets: applications to banking and insurance
IJDMMM(2011)
摘要
In solving unbalanced classification problems, machine learning algorithms are overwhelmed by the majority class and consequently misclassify the minority class observations. Here, we propose a hybrid under-sampling approach to improve the performance of classifiers. The proposed approach first employs k -reverse nearest neighbour (kRNN) method to detect the outliers from majority class. After removing the outliers, using K-means clustering, K-clusters are selected to further reduce the influence of the majority class. Then, we employed support vector machine (SVM), logistic regression (LR), multi layer perceptron (MLP), radial basis function network (RBF), group method of data handling (GMDH), genetic programming (GP) and decision tree (J48) for classification purpose. The effectiveness of the proposed approach was demonstrated on datasets taken from insurance fraud detection and credit card churn in banking domain. Ten-fold cross validation method was used in the study. It is observed that the proposed approach improved the performance of the classifiers.
更多查看译文
关键词
data mining
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络