Using Data Compression For Optimizing Fpga-Based Convolutional Neural Network Accelerators

ADVANCED PARALLEL PROCESSING TECHNOLOGIES(2017)

引用 9|浏览114
暂无评分
摘要
Convolutional Neural Network (CNN) has been extensively employed in research fields including multimedia recognition, computer version, etc. Various FPGA-based accelerators for deep CNN have been proposed to achieve high energy-efficiency. For some FPGA-based CNN accelerators in embedded systems, such as UAVs, IoT, and wearable devices, their overall performance is greatly bounded by the limited data bandwidth to the on-board DRAM. In this paper, we argue that it is feasible to overcome the bandwidth bottleneck using data compression techniques. We propose an effective roofline model to explore design tradeoff between computation logic and data bandwidth after applying data compression techniques to parameters of CNNs. We implement a decompression module and a CNN accelerator on a single Xilinx VC707 FPGA board with two different compression/decompression algorithms as case studies. Under a scenario with limited data bandwidth, the peak performance of our implementation can outperform designs using previous methods by 3.2x in overall performance.
更多
查看译文
关键词
CNN,FPGA,Compression/decompression
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要