Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs.

DAC(2017)

引用 453|浏览280
暂无评分
摘要
Convolutional neural networks (CNNs) have been widely applied in many deep learning applications. In recent years, the FPGA implementation for CNNs has attracted much attention because of its high performance and energy efficiency. However, existing implementations have difficulty to fully leverage the computation power of the latest FPGAs. In this paper we implement CNN on an FPGA using a systolic array architecture, which can achieve high clock frequency under high resource utilization. We provide an analytical model for performance and resource utilization and develop an automatic design space exploration framework, as well as source-to-source code transformation from a C program to a CNN implementation using systolic array. The experimental results show that our framework is able to generate the accelerator for real-life CNN models, achieving up to 461 GFlops for floating point data type and 1.2 Tops for 8-16 bit fixed point.
更多
查看译文
关键词
source-to-source code transformation,CNN implementation,automated systolic array architecture synthesis,convolutional neural networks,deep learning applications,automatic design space exploration,FPGA,high throughput CNN inference
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要