Fast And Efficient Implementation Of Convolutional Neural Networks On Fpga

Abhinav Podili,Chi Zhang,Viktor Prasanna

2017 IEEE 28TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP)（2017）

引用 74|浏览52

暂无评分

摘要

State-of-the-art CNN models for Image recognition use deep networks with small filters instead of shallow networks with large filters, because the former requires fewer weights. In the light of above trend, we present a fast and efficient FPGA based convolution engine to accelerate CNN models over small filters. The convolution engine implements Winograd minimal filtering algorithm to reduce the number of multiplications by 38% to 55% for state-of-the-art CNNs. We exploit the parallelism of the Winograd convolution engine to scale the overall performance. We show that our overall design sustains the peak throughput of the convolution engines. We propose a novel data layout to reduce the required memory bandwidth of our design by half. One noteworthy feature of our Winograd convolution engine is that it hides the computation latency of the pooling layer. As a case study we implement VGG16 CNN model and compare it with previous approaches. Compared with the state-of-the-art reduced precision VGG16 implementation, our implementation achieves 1.2x improvement in throughput by using 3x less multipliers and 2x less on-chip memory without impacting the classification accuracy. The improvements in throughput per multiplier and throughput per unit on-chip memory are 3.7x and 2.47x respectively, compared with the state-of-the-art design.

查看译文

关键词

Convolutional neural networks,Winograd minimal filtering algorithm,Efficient,Double buffering,Data reuse,Pipelining

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要