Parameter-efficient fine-tuning of large-scale pre-trained language models

Ning Ding,Yujia Qin, Guang Yang, Fuchao Wei, Zonghan Yang,Yusheng Su,Shengding Hu, Yulin Chen, Chi-Min Chan,Weize Chen,Jing Yi,Weilin Zhao,Xiaozhi Wang,Zhiyuan Liu大牛学者,Hai-Tao Zheng,Jianfei Chen,Yang Liu,Jie Tang大牛学者,Juanzi Li大牛学者,Maosong Sun大牛学者
引用 37|浏览2139
With the prevalence of pre-trained language models (PLMs) and the pre-training–fne-tuning paradigm, it has been continuously shown that larger models tend to yield better performance. However, as PLMs scale up, fne-tuning and storing all the parameters is prohibitively costly and eventually becomes practically infeasible. This necessitates a new branch of research focusing on the parameter-efcient adaptation of PLMs, which optimizes a small portion of the model parameters while keeping the rest fxed, drastically cutting down computation and storage costs. In general, it demonstrates that large-scale models could be efectively stimulated by the optimization of a few parameters. Despite the various designs, here we discuss and analyse the approaches under a more consistent and accessible term ‘delta-tuning’, where ‘delta’ a mathematical notation often used to denote changes, is borrowed to refer to the portion of parameters that are ‘changed’ during training. We formally describe the problem and propose a unifed categorization criterion for existing delta-tuning methods to explore their correlations and diferences. We also discuss the theoretical principles underlying the efectiveness of delta-tuning and interpret them from the perspectives of optimization and optimal control. Furthermore, we provide a holistic empirical study on over 100 natural language processing tasks and investigate various aspects of delta-tuning. With comprehensive study and analysis, our research demonstrates the theoretical and practical properties of delta-tuning in the adaptation of PLMs.
language models
AI 理解论文