A High-Performance CNN Processor Based on FPGA for MobileNets

2019 29th International Conference on Field Programmable Logic and Applications (FPL)(2019)

引用 100|浏览43
暂无评分
摘要
Convolution neural networks (CNNs) have been widely applied in the fields of computer vision tasks. However, it is hard to deploy those standard neural networks into embedded devices because of their large amount of operations and parameters. MobileNet, the state-of-the-art CNN which adopts depthwise separable convolution to replace the standard convolution has significantly reduced operations and parameters with only limited loss in accuracy. A high-performance CNN processor based on FPGA is proposed in this paper. To improve the efficiency, two dedicated computing engines named Conv Engine and Dwcv Engine were designed for pointwise convolution and depthwise convolution respectively. The schedule for Conv Engine and Dwcv Engine has significantly improved the efficiency of our accelerator. Furthermore, we designed a special architecture called Channel Augmentation to improve the efficiency in the first layer of MobileNets. The accelerator can be flexibly deployed to various devices with different configurations to balance hardware resources and computational performance. We implemented our accelerator on ZU2 and ZU9 MPSoC FPGAs. The classification on ImageNet achieved 205.3 frames per second(fps) on ZU2 and 809.8 fps on ZU9, which is 15.4x speedup on ZU2 and 60.7x speedup on ZU9 compared to CPU. We also deployed MobileNet + SSD network on our accelerator for object detection, and achieved 31.0 fps on ZU2 and 124.3 fps on ZU9.
更多
查看译文
关键词
convolution neural network,FPGA,hardware accelerator,MobileNet
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要