SAPT: A Shared Attention Framework for Parameter-Efficient Continual Learning of Large Language Models
CoRR(2024)
摘要
The continual learning (CL) ability is vital for deploying large language
models (LLMs) in the dynamic world. Existing methods devise the learning module
to acquire task-specific knowledge with parameter-efficient tuning (PET) block
and the selection module to pick out the corresponding one for the testing
input, aiming at handling the challenges of catastrophic forgetting and
knowledge transfer in CL. However, these methods tend to address only one of
the challenges, ignoring the potential of aligning the two modules to
effectively address catastrophic forgetting and knowledge transfer
simultaneously. To this end, we propose a novel Shared Attention Framework
(SAPT), to align the PET learning and selection via the Shared Attentive
Learning & Selection module. Extensive Experiments on two CL benchmarks
demonstrate the superiority of SAPT. Moreover, SAPT consistently demonstrates
its superiority when we scale it to different model sizes (from 770M to 13B),
different model architectures (T5 and LLaMA-2) and unseen tasks.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要