Affinity and class probability-based fuzzy support vector machine for imbalanced data sets.

Neural Networks(2020)

引用 43|浏览37
暂无评分
摘要
The learning problem from imbalanced data sets poses a major challenge in data mining community. Although conventional support vector machine can generally show relatively robust performance in dealing with the classification problems of imbalanced data sets, it treats all training samples with the same contribution for learning, which results in the final decision boundary biasing toward the majority class especially in the presence of outliers or noises. In this paper, we propose a new affinity and class probability-based fuzzy support vector machine technique (ACFSVM). The affinity of a majority class sample is calculated according to support vector description domain (SVDD) model trained only by the given majority class training samples in kernel space similar to that used for FSVM learning. The obtained affinity can be used for identifying possible outliers and some border samples existing in the majority class training samples. In order to eliminate the effect of noises, we employ the kernel k-nearest neighbor method to determine the class probability of the majority class samples in the same kernel space as before. The samples with lower class probabilities are more likely to be noises and their contribution for learning seems to be reduced by their low memberships constructed by combining the affinities and the class probabilities. Thus, ACFSVM can pay more attention to the majority class samples with higher affinities and class probabilities while reducing their effects of the ones with lower affinities and class probabilities, eventually skewing the final classification boundary toward the majority class. In addition, the minority class samples are assigned relative high memberships to guarantee their importance for the model learning. The extensive experimental results on the different imbalanced datasets from UCI repository demonstrate that the proposed approach can achieve better generalization performance in terms of G-Mean, F-Measure, and AUC as compared to the other existing imbalanced dataset classification techniques.
更多
查看译文
关键词
Imbalanced data,Fuzzy support vector machine (FSVM),Affinity,Class probability,Kernelknn
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要