Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning
arxiv(2024)
摘要
Self-Supervised Contrastive Learning has proven effective in deriving
high-quality representations from unlabeled data. However, a major challenge
that hinders both unimodal and multimodal contrastive learning is feature
suppression, a phenomenon where the trained model captures only a limited
portion of the information from the input data while overlooking other
potentially valuable content. This issue often leads to indistinguishable
representations for visually similar but semantically different inputs,
adversely affecting downstream task performance, particularly those requiring
rigorous semantic comprehension. To address this challenge, we propose a novel
model-agnostic Multistage Contrastive Learning (MCL) framework. Unlike standard
contrastive learning which inherently captures one single biased feature
distribution, MCL progressively learns previously unlearned features through
feature-aware negative sampling at each stage, where the negative samples of an
anchor are exclusively selected from the cluster it was assigned to in
preceding stages. Meanwhile, MCL preserves the previously well-learned features
by cross-stage representation integration, integrating features across all
stages to form final representations. Our comprehensive evaluation demonstrates
MCL's effectiveness and superiority across both unimodal and multimodal
contrastive learning, spanning a range of model architectures from ResNet to
Vision Transformers (ViT). Remarkably, in tasks where the original CLIP model
has shown limitations, MCL dramatically enhances performance, with improvements
up to threefold on specific attributes in the recently proposed MMVP benchmark.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要