High-efficient MPSoC-based CNNs accelerator with optimized storage and dataflow

The Journal of Supercomputing(2021)

引用 0|浏览14
暂无评分
摘要
The convolutional neural networks (CNNs) are widely used in modern AI systems for their superior accuracy but at the cost of high computational complexity, which involve enormous communication bandwidth and storage resources requirement. The computation requirement can be addressed effectively to achieve high throughput by highly parallel compute paradigms of current CNNs accelerators. But the energy consumption still remains high as communication can be more expensive than computation, especially for low power embedded platform. To address this problem, this paper proposes a CNNs accelerator based on a novel storage and dataflow on multi-processor system on chip (MPSoC) platform. By minimizing data access and movement and maximizing data reuse, it can achieve the energy efficient CNNs inference acceleration. The optimization strategies mainly involve four aspects. Firstly, an external memory sharing architecture adopting two-dimensional array storage mode for CPU-FPGA collaborative processing is proposed to achieve high data throughput and low bandwidth requirement for off-chip data transmission. Secondly, the minimized data access and movement on chip are realized by designing a multi-level hierarchical storage architecture. Thirdly, a cyclic data shifting method is proposed to achieve maximized data reuse based on both spatial and temporal. In addition, a bit fusion method based on the 8-bit dynamic fixed-point quantization is adopted to achieve double throughput and computational efficiency of a single DSP. The accelerator proposed in this paper is implemented on Zynq UltraScale + MPSoC ZCU102 evaluation board. By running the benchmark network of VGG16 and Tiny-YOLO on the accelerator, the throughput and the energy efficiency are evaluated. Compared with the current typical accelerators, the proposed accelerator can increase system throughput by up to 41 x , single DSP throughput by up to 7.63x, and system energy efficiency by up to 6.3 x .
更多
查看译文
关键词
Cnns accelerator,MPSoC platform,Multilevel storage,Memory sharing,Data reuse
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要