Multiset Feature Learning for Highly Imbalanced Data Classification

THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE(2021)

引用 104|浏览118
暂无评分
摘要
With the expansion of data, increasing imbalanced data has emerged. When the imbalance ratio (IR) of data is high, most existing imbalanced learning methods decline seriously in classification performance. In this paper, we systematically investigate the highly imbalanced data classification problem, and propose an uncorrelated cost-sensitive multiset learning (UCML) approach for it. Specifically, UCML first constructs multiple balanced subsets through random partition, and then employs the multiset feature learning (MFL) to learn discriminant features from the constructed multiset. To enhance the usability of each subset and deal with the non-linearity issue existed in each subset, we further propose a deep metric based UCML (DM-UCML) approach. DM-UCML introduces the generative adversarial network technique into the multiset constructing process, such that each subset can own similar distribution with the original dataset. To cope with the non-linearity issue, DM-UCML integrates deep metric learning with MFL, such that more favorable performance can be achieved. In addition, DM-UCML designs a new discriminant term to enhance the discriminability of learned metrics. Experiments on eight traditional highly class-imbalanced datasets and two large-scale datasets indicate that: the proposed approaches outperform state-of-the-art highly imbalanced learning methods and are more robust to high IR.
更多
查看译文
关键词
Highly imbalanced data classification,multiset feature learning,deep metric learning,generative adversarial network,cost-sensitive factor,weighted uncorrelated constraint
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要