Your representations are in the network: composable and parallel adaptation for large scale models.

Yonatan Dukler,Alessandro Achille,Hao Yang, Varsha Vivek,Luca Zancato,Benjamin Bowman,Avinash Ravichandran,Charless C. Fowlkes,Ashwin Swaminathan,Stefano Soatto

NeurIPS（2023）

引用 0|浏览19

暂无评分

摘要

We present a framework for transfer learning that efficiently adapts a large base-model by learning lightweight cross-attention modules attached to its intermediate activations. We name our approach InCA (Introspective-Cross-Attention) and show that it can efficiently survey a network’s representations and identify strong performing adapter models for a downstream task. During training, InCA enables training numerous adapters efficiently and in parallel, isolated from the frozen base model. On the ViT-L/16 architecture, our experiments show that a single adapter, 1.3% of the full model, is able to reach full fine-tuning accuracy on average across 11 challenging downstream classification tasks. Compared with other forms of parameter-efficient adaptation, the isolated nature of the InCA adaptation is computationally desirable for large-scale models. For instance, we adapt ViT-G/14 (1.8B+ parameters) quickly with 20+ adapters in parallel on a single V100 GPU (76% GPU memory reduction) and exhaustively identify its most useful representations. We further demonstrate how the adapters learned by InCA can be incrementally modified or combined for flexible learning scenarios and our approach achieves state of the art performance on the ImageNet-to-Sketch multi-task benchmark.

查看译文

关键词

lightweight transfer,models,cross-attention,pre-trained

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要