Compute-Efficient Neural-Network Acceleration.

Ephrem Wu,Xiaoqian Zhang,David Berman,Inkeun Cho,John Thendean

FPGA（2019）

引用 30|浏览34

暂无评分

摘要

To enhance the performance of FPGA-based neural-network accelerators, maximizing both operating clock rates and compute efficiency is paramount. Streamlining data movement between memory and compute holds the key to boosting these metrics. To unleash latent performance in FPGA-based inference processors, we outline a convolutional neural network accelerator that operates at 92.9% of the peak FPGA clock rate. First, we map neural-network operators to a minimalist hardware architecture to simplify data movement between memory and compute. Doing so enables the design to close timing at high clock rates. Second, we describe a schedule that keeps compute utilization high. We apply this architecture to classify MNIST, CIFAR-10, and ImageNet datasets. This design achieves 95.5% compute efficiency with GoogLeNet, whose nested topology makes creating an efficient design especially challenging.

查看译文

关键词

Convolutional neural networks, compute efficiency, FPGA, GoogLeNet, image classification, reduced precision, tensor processing, accelerator, deep learning, reconfigurable architecture

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要