Mitigating Outlier Activations in Low-Precision Fine-Tuning of Language Models
CoRR(2023)
摘要
Low-precision fine-tuning of language models has gained prominence as a
cost-effective and energy-efficient approach to deploying large-scale models in
various applications. However, this approach is susceptible to the existence of
outlier values in activation. The outlier values in the activation can
negatively affect the performance of fine-tuning language models in the
low-precision regime since they affect the scaling factor and thus make
representing smaller values harder. This paper investigates techniques for
mitigating outlier activation in low-precision integer fine-tuning of the
language models. Our proposed novel approach enables us to represent the
outlier activation values in 8-bit integers instead of floating-point (FP16)
values. The benefit of using integers for outlier values is that it enables us
to use operator tiling to avoid performing 16-bit integer matrix multiplication
to address this problem effectively. We provide theoretical analysis and
supporting experiments to demonstrate the effectiveness of our approach in
improving the robustness and performance of low-precision fine-tuned language
models.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要