A Flexible Design Automation Tool for Accelerating Quantized Spectral CNNs

2019 29th International Conference on Field Programmable Logic and Applications (FPL)(2019)

引用 8|浏览16
暂无评分
摘要
CNNs have proven to be extremely powerful in various computer vision applications. To alleviate the computation burden and improve hardware efficiency, low-complexity convolution algorithms (e.g., spectral convolution) and data quantization schemes have been implemented on FPGAs. However, to translate the reduced algorithm complexity into improved hardware performance, we need significant manual tuning of mapping parameters specific to the CNN model and the target FPGA device. We propose a flexible tool to automate the process of generating high throughput accelerators for quantized, spectral CNNs. The tool takes as input high level specification of the CNN model, the data quantization scheme and the target hardware architecture. It outputs synthesizable Verilog after fast exploration of the complete design space. Our tool is flexible in three dimensions: 1) data representation, 2) FPGA architecture, and 3) CNN models. To support arbitrary quantization bit width, we propose a resource-efficient multiplier design, which uses the fixed, high bit-width DSPs to implement various low bit-width complex multiplications needed in spectral CNNs. To support FPGAs with limited on-chip memory, we propose a systolic array-based architecture for spectral convolution, which exploits high computation parallelism in DSPs without stressing BRAM resources. To support CNNs with various layer parameters, we tile and permute data blocks to saturate the communication and computation capacity. Finally, we propose a fast design space exploration algorithm to complete the end-to-end Verilog generation. The whole design space exploration and verilog generation takes less than 1 second on an Intel Core i5 laptop. We perform evaluation on Stratix-10 and Stratix-V FPGAs, using AlexNet and VGG16. The generated accelerators achieve 2× to 4× higher throughput than state-of-the-art, for 8-bit and 16-bit data quantization.
更多
查看译文
关键词
Convolutional Neural Networks,Design Automation,Field Programmable Gate Arrays
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要