CoDeNet: Efficient Deployment of Input-Adaptive Object Detection on Embedded FPGAs

Qijing Huang,Dequan Wang,Zhen Dong,Yizhao Gao,Yaohui Cai,Tian Li,Bichen Wu,Kurt Keutzer,John Wawrzynek

FPGA（2021）

引用 42|浏览58

暂无评分

摘要

ABSTRACTDeploying deep learning models on embedded systems for computer vision tasks has been challenging due to limited compute resources and strict energy budgets. The majority of existing work focuses on accelerating image classification, while other fundamental vision problems, such as object detection, have not been adequately addressed. Compared with image classification, detection problems are more sensitive to the spatial variance of objects, and therefore, require specialized convolutions to aggregate spatial information. To address this need, recent work introduces dynamic deformable convolution to augment regular convolutions. Regular convolutions process a fixed grid of pixels across all the spatial locations in an image, while dynamic deformable convolution may access arbitrary pixels in the image with the access pattern being input-dependent and varying with spatial location. These properties lead to inefficient memory accesses of inputs with existing hardware. In this work, we harness the flexibility of FPGAs to develop a novel object detection pipeline with deformable convolutions. We show the speed-accuracy tradeoffs for a set of algorithm modifications including irregular-access versus limited-range and fixed-shape on a flexible hardware accelerator. We evaluate these algorithmic changes with corresponding hardware optimizations and show a 1.36x and 9.76x speedup respectively for the full and depthwise deformable convolution on hardware with minor accuracy loss. We then co-design a network called CoDeNet with the modified deformable convolution for object detection and quantize the network to 4-bit weights and 8-bit activations. With our high-efficiency implementation, our solution reaches 26.9 frames per second with a tiny model size of 0.76 MB while achieving 61.7 AP50 on the standard object detection dataset, Pascal VOC. With our higher-accuracy implementation, our model gets to 67.1 AP50 on Pascal VOC with only 2.9 MB of parameters--20.9x smaller but 10% more accurate than Tiny-YOLO.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要