Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
CoRR(2023)
摘要
In this paper, we unveil that Language Models (LMs) can acquire new
capabilities by assimilating parameters from homologous models without
retraining or GPUs. We first introduce DARE to set most delta parameters (i.e.,
the disparity between fine-tuned and pre-trained parameters) to zeros without
affecting the abilities of Supervised Fine-Tuning (SFT) LMs, which randomly
Drops delta parameters with a ratio p And REscales the remaining ones by 1/(1 -
p) to approximate the original embeddings. Then, we use DARE as a versatile
plug-and-play technique to sparsify delta parameters of multiple SFT homologous
models for mitigating parameter interference and merge them into a single model
by parameter fusing. We experiment with encoder- and decoder-based LMs, showing
that: (1) SFT delta parameter value ranges are typically small (within 0.005)
with extreme redundancy, and DARE can effortlessly eliminate 90
them. (2) DARE can merge multiple task-specific LMs into one LM with diverse
capabilities. For instance, the amalgamation of WizardLM and WizardMath
significantly enhances the GSM8K zero-shot accuracy of WizardLM from 2.2 to
66.3, retaining the instruction-following proficiency while surpassing
WizardMath's 64.2 performance. Our merged LM also ranks first among models with
7 billion parameters on the Open LLM Leaderboard.
更多查看译文
关键词
super mario,language models,absorbing abilities
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要