Asymmetry in Low-Rank Adapters of Foundation Models
CoRR(2024)
摘要
Parameter-efficient fine-tuning optimizes large, pre-trained foundation
models by updating a subset of parameters; in this class, Low-Rank Adaptation
(LoRA) is particularly effective. Inspired by an effort to investigate the
different roles of LoRA matrices during fine-tuning, this paper characterizes
and leverages unexpected asymmetry in the importance of low-rank adapter
matrices. Specifically, when updating the parameter matrices of a neural
network by adding a product BA, we observe that the B and A matrices have
distinct functions: A extracts features from the input, while B uses these
features to create the desired output. Based on this observation, we
demonstrate that fine-tuning B is inherently more effective than fine-tuning
A, and that a random untrained A should perform nearly as well as a
fine-tuned one. Using an information-theoretic lens, we also bound the
generalization of low-rank adapters, showing that the parameter savings of
exclusively training B improves the bound. We support our conclusions with
experiments on RoBERTa, BART-Large, LLaMA-2, and ViTs.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要