MultiQuant: A Novel Multi-Branch Topology Method for Arbitrary Bit-width Network Quantization

CoRR(2023)

引用 0|浏览25
暂无评分
摘要
Arbitrary bit-width network quantization has received significant attention due to its high adaptability to various bit-width requirements during runtime. However, in this paper, we investigate existing methods and observe a significant accumulation of quantization errors caused by frequent bit-width switching of weights and activations, leading to limited performance. To address this issue, we propose MultiQuant, a novel method that utilizes a multi-branch topology for arbitrary bit-width quantization. MultiQuant duplicates the network body into multiple independent branches and quantizes the weights of each branch to a fixed 2-bit while retaining the input activations in the expected bit-width. This approach maintains the computational cost as the same while avoiding the switching of weight bit-widths, thereby substantially reducing errors in weight quantization. Additionally, we introduce an amortization branch selection strategy to distribute quantization errors caused by activation bit-width switching among branches to enhance performance. Finally, we design an in-place distillation strategy that facilitates guidance between branches to further enhance MultiQuant's performance. Extensive experiments demonstrate that MultiQuant achieves significant performance gains compared to existing arbitrary bit-width quantization methods. Code is at \url{https://github.com/zysxmu/MultiQuant}.
更多
查看译文
关键词
network,multiquant,multi-branch,bit-width
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要