Vision-Language Alignment Learning Under Affinity and Divergence Principles for Few-Shot Out-of-Distribution Generalization

Lin Zhu, Weihan Yin, Yiyao Yang,Fan Wu, Zhaoyu Zeng, Qinying Gu,Xinbing Wang,Chenghu Zhou,Nanyang Ye

International Journal of Computer Vision(2024)

引用 0|浏览6
暂无评分
摘要
Recent advances in fine-tuning large-scale vision-language pre-trained models (VL-PTMs) have shown promising results in quick adaption to downstream tasks. However, prior research often lacks comprehensive investigation into out-of-distribution (OOD) generalization. Fine-tuning has a potential risk of overfitting, especially on few-shot OOD datasets when significant distribution shifts occur between the few-shot training examples and test sets. Previous research on fine-tuning’s robustness to distribution shifts does not consider different characteristics of distribution shifts and may not effectively handle noisy data with spurious correlations. To address these challenges, we propose the Vision-Language Alignment Learning under Affinity and Divergence Principles (VLAD) to adapt VL-PTMs to robust few-shot OOD generalization with theoretical guarantees. Built upon the large-scale pre-trained vision-language foundation model CLIP, we leverage frozen language embeddings as invariant anchors to protect against distribution shifts, while using adapter layers to fine-tune pre-trained visual features for improved vision-language alignment. Besides, we introduce affinity and divergence principles to further mitigate overfitting during the vision-language aligning process by increasing class discrimination and suppressing non-causal features. More importantly, we offer theoretical evidence highlighting the superiority of general language knowledge in achieving more robust OOD generalization performance. The tighter upper bound of the OOD generalization errors by the proposed regularization loss is also shown in theoretical analysis. Our approach is substantiated by extensive experiments and ablation studies on diverse datasets, validating our theoretical findings. The code is available at https://github.com/LinLLLL/VLAD .
更多
查看译文
关键词
Pre-trained models,Alignment learning,Out-of-distribution generalization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要