ME-ViT: A Single-Load Memory-Efficient FPGA Accelerator for Vision Transformers
2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics (HiPC)(2024)
摘要
Vision Transformers (ViTs) have emerged as a state-of-the-art solution for
object classification tasks. However, their computational demands and high
parameter count make them unsuitable for real-time inference, prompting the
need for efficient hardware implementations. Existing hardware accelerators for
ViTs suffer from frequent off-chip memory access, restricting the achievable
throughput by memory bandwidth. In devices with a high compute-to-communication
ratio (e.g., edge FPGAs with limited bandwidth), off-chip memory access imposes
a severe bottleneck on overall throughput. This work proposes ME-ViT, a novel
Memory Efficient FPGA accelerator for ViT
inference that minimizes memory traffic. We propose a single-load
policy in designing ME-ViT: model parameters are only loaded once,
intermediate results are stored on-chip, and all operations are implemented in
a single processing element. To achieve this goal, we design a memory-efficient
processing element (ME-PE), which processes multiple key operations of ViT
inference on the same architecture through the reuse of multi-purpose
buffers. We also integrate the Softmax and LayerNorm functions into the ME-PE,
minimizing stalls between matrix multiplications. We evaluate ME-ViT on
systolic array sizes of 32 and 16, achieving up to a 9.22× and
17.89× overall improvement in memory bandwidth, and a 2.16×
improvement in throughput per DSP for both designs over state-of-the-art ViT
accelerators on FPGA. ME-ViT achieves a power efficiency improvement of up to
4.00× (1.03×) over a GPU (FPGA) baseline. ME-ViT enables up to 5
ME-PE instantiations on a Xilinx Alveo U200, achieving a 5.10×
improvement in throughput over the state-of-the art FPGA baseline, and a
5.85× (1.51×) improvement in power efficiency over the GPU (FPGA)
baseline.
更多查看译文
关键词
Vision Transformer,FPGA Accelerator,Memory Bandwidth
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要