Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks
CoRR(2023)
摘要
The rapid development of AI systems has been greatly influenced by the
emergence of foundation models. A common approach for targeted problems
involves fine-tuning these pre-trained foundation models for specific target
tasks, resulting in a rapid spread of models fine-tuned across a diverse array
of tasks. This work focuses on the problem of merging multiple fine-tunings of
the same foundation model derived from a spectrum of auxiliary tasks. We
introduce a new simple method, Model Breadcrumbs, which consists of a sparsely
defined set of weights that carve out a trajectory within the weight space of a
pre-trained model, enhancing task performance when traversed. These breadcrumbs
are constructed by subtracting the weights from a pre-trained model before and
after fine-tuning, followed by a sparsification process that eliminates weight
outliers and negligible perturbations. Our experiments demonstrate the
effectiveness of Model Breadcrumbs to simultaneously improve performance across
multiple tasks. This contribution aligns with the evolving paradigm of
updatable machine learning, reminiscent of the collaborative principles
underlying open-source software development, fostering a community-driven
effort to reliably update machine learning models. Our method is shown to be
more efficient and unlike previous proposals does not require hyperparameter
tuning for each new task added. Through extensive experimentation involving
various models, tasks, and modalities we establish that integrating Model
Breadcrumbs offers a simple, efficient, and highly effective approach for
constructing multi-task models and facilitating updates to foundation models.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要