Why Lift so Heavy? Slimming Large Language Models by Cutting Off the Layers
CoRR(2024)
摘要
Large Language Models (LLMs) possess outstanding capabilities in addressing
various natural language processing (NLP) tasks. However, the sheer size of
these models poses challenges in terms of storage, training and inference due
to the inclusion of billions of parameters through layer stacking. While
traditional approaches such as model pruning or distillation offer ways for
reducing model size, they often come at the expense of performance retention.
In our investigation, we systematically explore the approach of reducing the
number of layers in LLMs. Surprisingly, we observe that even with fewer layers,
LLMs maintain similar or better performance levels, particularly in
prompt-based fine-tuning for text classification tasks. Remarkably, in certain
cases, models with a single layer outperform their fully layered counterparts.
These findings offer valuable insights for future work aimed at mitigating the
size constraints of LLMs while preserving their performance, thereby opening
avenues for significantly more efficient use of LLMs.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要