Feature Mixing-Based Active Learning for Multi-Label Text Classification

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2024)

引用 0|浏览0
暂无评分
摘要
Active learning (AL) aims to reduce labeling costs by selecting the most valuable samples to annotate from a set of unlabeled data. However, recognizing these samples is particularly challenging in multi-label text classification tasks due to the high dimensionality but sparseness of label spaces. Existing AL techniques either fail to sufficiently capture label correlations, resulting in label imbalance in the selected samples, or suffer significant computing costs when analyzing the informative potential of unlabeled samples across all labels. Facing these challenges, we propose an efficient two-stage sample acquisition strategy for multi-label active learning, called ALMuLa-mix. For saving the computational costs, ALMuLa-mix first attempts to identify unlabeled samples with novel features by employing a time-efficient feature-mixing method in conjunction with label correlations. Regarding the label imbalance, ALMuLa-mix then leverages the minority class in the labeled set to select a small batch of candidate unlabeled samples with greater inter-class diversity from the candidates with novel features. Experimental results on publicly available datasets show that ALMuLa-mix is superior to other strong baselines for handling multi-label text classification tasks.
更多
查看译文
关键词
Active learning,Multi-label text classification,Interpolation,Label correlation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要