A Conv-GEMM reconfigurable accelerator with WS-RS dataflow for high throughput processing

ELECTRONICS LETTERS(2024)

引用 0|浏览19
暂无评分
摘要
Convolution and matrix operations are both important computations in Deep Neural Networks (DNNs). However, the significant differences between convolution and matrix computation patterns have posed a challenge in efficiently supporting both convolution (Conv) and general matrix multiplication (GEMM) on hardware design. This paper proposes a Conv-GEMM reconfigurable accelerator architecture for high throughput edge processing. A weight stationary-row streaming (WS-RS) dataflow scheme is proposed, which maximizes data reuse through hierarchical memory structures and flexible PE connections, and supports high throughput edge-based deep learning algorithms. Based on the proposed dataflow, multi-scale memory access network (MMAN), reconfigurable accumulator array (RAA), and configurable instruction set architecture (ISA) are designed to optimize computation throughput and energy efficiency. The accelerator is designed under 65 nm technology, achieves peak performance of 1.15 TOPS at 250 MHz, with an energy efficiency of 1.14 TOPS/W. The GEMM computation achieves 85.7% latency improvement and the Mobilenet-V1 processing achieves a throughput of 529 fps under a 256 x 224 image size and an 87.15% (top-5) accuracy on the ImageNet dataset. The low-power requirement for edge devices poses a challenge in efficiently supporting convolution and matrix operations in Deep Neural Networks (DNNs). This paper proposes a reconfigurable accelerator architecture. A weight stationary-row streaming (WS-RS) dataflow scheme is proposed, which maximizes data reuse through hierarchical memory structures. The accelerator achieves peak performance of 1.15 TOPS. image
更多
查看译文
关键词
digital circuits,digital integrated circuits,image and vision processing and display technology,image processing,integrated circuit design
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要