LPAC - A Low-Precision Accelerator for CNN on FPGAs.

FPGA(2020)

引用 2|浏览87
暂无评分
摘要
Low bit quantization of neural network is required on edge devices to achieve lower power consumption and higher performance. 8bit or binary network either consumes a lot of resources or has accuracy degradation. Thus, a full-process hardware-friendly quantization solution of 4A4W (activations 4bit and weights 4bit) is proposed to achieve better accuracy/resource trade-off. It doesn't contain any additional floating operations and achieve accuracy comparable to full-precision. We also implement a low-precision accelerator for CNN (LPAC) on the Xilinx FPGA, which takes full advantage of its DSP by efficiently mapping convolutional computations. Through on-chip reassign management and resource-saving analysis, high performance can be achieved on small chips. Our 4A4W solution achieves 1.8x higher performance than 8A8W and 2.42x increase in power efficiency under the same resource. On ImageNet classification, the accuracy has a gap less than 1% to full-precision in Top-5. On the human pose estimation, we achieve 261 frames per second on ZU2EG, which is 1.78x speed up compared to 8A8W and the accuracy has only 1.62% gap to full-precision. This proves that our solution has better universality.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要