BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation
CoRR(2024)
摘要
The upscaling of Large Language Models (LLMs) has yielded impressive advances
in natural language processing, yet it also poses significant deployment
challenges. Weight quantization has emerged as a widely embraced solution to
reduce memory and computational demands. This paper introduces BitDistiller, a
framework that synergizes Quantization-Aware Training (QAT) with Knowledge
Distillation (KD) to boost the performance of LLMs at ultra-low precisions
(sub-4-bit). Specifically, BitDistiller first incorporates a tailored
asymmetric quantization and clipping technique to maximally preserve the
fidelity of quantized weights, and then proposes a novel Confidence-Aware
Kullback-Leibler Divergence (CAKLD) objective, which is employed in a
self-distillation manner to enable faster convergence and superior model
performance. Empirical evaluations demonstrate that BitDistiller significantly
surpasses existing methods in both 3-bit and 2-bit configurations on general
language understanding and complex reasoning benchmarks. Notably, BitDistiller
is shown to be more cost-effective, demanding fewer data and training
resources. The code is available at https://github.com/DD-DuDa/BitDistiller.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要