A Block-Floating-Point Arithmetic Based FPGA Accelerator for Convolutional Neural Networks

2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP)(2019)

引用 1|浏览13
暂无评分
摘要
Convolutional neural networks (CNNs) have been widely used in computer vision applications and achieved great success. However, large-scale CNN models usually consume a lot of computing and memory resources, which makes it difficult for them to be deployed on embedded devices. An efficient block-floating-point (BFP) arithmetic is proposed in this paper. compared with 32-bit floating-point arithmetic, the memory and off-chip bandwidth requirements during convolution are reduced by 50% and 72.37%, respectively. Due to the adoption of BFP arithmetic, the complex multiplication and addition operations of floating-point numbers can be replaced by the corresponding operations of fixed-point numbers, which is more efficient on hardware. A CNN model can be deployed on our accelerator with no more than 0.14% top-1 accuracy loss, and there is no need for retraining and fine-tuning. By employing a series of ping-pong memory access schemes, 2-dimensional propagate partial multiply-accumulate (PPMAC) processors, and an optimized memory system, we implemented a CNN accelerator on Xilinx VC709 evaluation board. The accelerator achieves a performance of 665.54 GOP/s and a power efficiency of 89.7 GOP/s/W under a 300 MHz working frequency, which outperforms previous FPGA based accelerators significantly.
更多
查看译文
关键词
CNN,FPGA,block-floating-point
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要