Introducing Routing Functions to Vision-Language Parameter-Efficient Fine-Tuning with Low-Rank Bottlenecks
arxiv(2024)
摘要
Mainstream parameter-efficient fine-tuning (PEFT) methods, such as LoRA or
Adapter, project a model's hidden states to a lower dimension, allowing
pre-trained models to adapt to new data through this low-rank bottleneck.
However, PEFT tasks involving multiple modalities, like vision-language (VL)
tasks, require not only adaptation to new data but also learning the
relationship between different modalities. Targeting at VL PEFT tasks, we
propose a family of operations, called routing functions, to enhance VL
alignment in the low-rank bottlenecks. The routing functions adopt linear
operations and do not introduce new trainable parameters. In-depth analyses are
conducted to study their behavior. In various VL PEFT settings, the routing
functions significantly improve performance of the original PEFT methods,
achieving over 20
(RoBERTa_large+ViT-L/16) and 30
(GPT2-medium+ViT-L/16). Also when fine-tuning a pre-trained multimodal model
such as CLIP-BART, we observe smaller but consistent improvements across a
range of VL PEFT tasks.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要