A Survey on Knowledge Distillation of Large Language Models
CoRR(2024)
摘要
This survey presents an in-depth exploration of knowledge distillation (KD)
techniques within the realm of Large Language Models (LLMs), spotlighting the
pivotal role of KD in transferring sophisticated capabilities from proprietary
giants such as GPT-4 to accessible, open-source models like LLaMA and Mistral.
Amidst the evolving AI landscape, this work elucidates the critical disparities
between proprietary and open-source LLMs, demonstrating how KD serves as an
essential conduit for imbuing the latter with the former's advanced
functionalities and nuanced understandings. Our survey is meticulously
structured around three foundational pillars: algorithm, skill, and
verticalization – providing a comprehensive examination of KD mechanisms, the
enhancement of specific cognitive abilities, and their practical implications
across diverse fields. Crucially, the survey navigates the intricate interplay
between data augmentation (DA) and KD, illustrating how DA emerges as a
powerful paradigm within the KD framework to bolster LLMs' performance. By
leveraging DA to generate context-rich, skill-specific training data, KD
transcends traditional boundaries, enabling open-source models to approximate
the contextual adeptness, ethical alignment, and deep semantic insights
characteristic of their proprietary counterparts. This work aims to provide an
insightful guide for researchers and practitioners, offering a detailed
overview of current methodologies in knowledge distillation and proposing
future research directions. By bridging the gap between proprietary and
open-source LLMs, this survey underscores the potential for more accessible,
efficient, and sustainable AI solutions, fostering a more inclusive and
equitable landscape in AI advancements. An associated Github repository is
available at https://github.com/Tebmer/Awesome-Knowledge-Distillation-of-LLMs.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要