BETA: Binarized Energy-Efficient Transformer Accelerator at the Edge
CoRR(2024)
摘要
Existing binary Transformers are promising in edge deployment due to their
compact model size, low computational complexity, and considerable inference
accuracy.However, deploying binary Transformers faces challenges on prior
processors due to inefficient execution of quantized matrix multiplication
(QMM) and the energy consumption overhead caused by multi-precision
activations.To tackle the challenges above, we first develop a computation flow
abstraction method for binary Transformers to improve QMM execution efficiency
by optimizing the computation order.Furthermore, a binarized energy-efficient
Transformer accelerator, namely BETA, is proposed to boost the efficient
deployment at the edge.Notably, BETA features a configurable QMM engine,
accommodating diverse activation precisions of binary Transformers and offering
high-parallelism and high-speed for QMMs with impressive energy
efficiency.Experimental results evaluated on ZCU102 FPGA show BETA achieves an
average energy efficiency of 174 GOPS/W, which is 1.76 21.92x higher than prior
FPGA-based accelerators, showing BETA's good potential for edge Transformer
acceleration.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要