Solving Data Imbalance in Text Classification With Constructing Contrastive Samples.

IEEE Access(2023)

引用 0|浏览13
暂无评分
摘要
Contrastive learning (CL) has been successfully applied in Natural Language Processing (NLP) as a powerful representation learning method and has shown promising results in various downstream tasks. Recent research has highlighted the importance of constructing effective contrastive samples through data augmentation. However, current data augmentation methods primarily rely on random word deletion, substitution, and cropping, which may introduce noisy samples and hinder representation learning. In this article, we propose a novel approach to address data imbalance in text classification by constructing contrastive samples. Our method involves the use of a Label-indicative Component to generate high-quality positive samples for the minority class, along with the introduction of a Hard Negative Mixing strategy to synthesize challenging negative samples at the feature level. By applying supervised contrastive learning to these samples, we are able to obtain superior text representations, which significantly benefit text classification tasks with imbalanced data. Our approach effectively mitigates distributional biases and promotes noise-resistant representation learning. To validate the effectiveness of our method, we conducted experiments on benchmark datasets (THUCNews, AG's News, 20NG) as well as the imbalanced FDCNews dataset. The code for our method is publicly available at the following GitHub repository: https://github.com/hanggun/CLDMTC.
更多
查看译文
关键词
data imbalance,text classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要