IM-Unpack: Training and Inference with Arbitrarily Low Precision Integers
arxiv(2024)
摘要
GEneral Matrix Multiply (GEMM) is a central operation in deep learning and
corresponds to the largest chunk of the compute footprint. Therefore, improving
its efficiency is an active topic of ongoing research. A popular strategy is
the use of low bit-width integers to approximate the original entries in a
matrix. This allows efficiency gains, but often requires sophisticated
techniques to control the rounding error incurred. In this work, we first
verify/check that when the low bit-width restriction is removed, for a variety
of Transformer-based models, whether integers are sufficient for all GEMMs need
– for both training and inference stages, and can achieve parity with
floating point counterparts. No sophisticated techniques are needed. We find
that while a large majority of entries in matrices (encountered in such models)
can be easily represented by low bit-width integers, the existence of a
few heavy hitter entries make it difficult to achieve efficiency gains via the
exclusive use of low bit-width GEMMs alone. To address this issue, we develop a
simple algorithm, Integer Matrix Unpacking (IM-Unpack), to unpack a
matrix with large integer entries into a larger matrix whose entries all lie
within the representable range of arbitrarily low bit-width integers. This
allows equivalence with the original GEMM, i.e., the exact result can be
obtained using purely low bit-width integer GEMMs. This comes at the cost of
additional operations – we show that for many popular models, this overhead is
quite small.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要