Accelerator Templates and Runtime Support for Variable Precision CNN

semanticscholar(2017)

引用 3|浏览9
暂无评分
摘要
Deep learning algorithms are playing an important role in computer vision and image recognition tasks. At the very outset, one can view neural networks as an function approximator. Recent trends in deep learning involves exploring lower precision operations for increasing performance and energy efficiency. To this end, In this paper we provide variable precision accelerator template in FPGA and its corresponding software and runtime stacks that interfaces with these accelerators. We first implement the prototype CNN template to support FP-32 based weights and activations. We then extend the design to support Int16, Int8 ,Int4 ,1-bit and ternary weights with Fp-32 and Int8 activations in Intel HARPv2 platform. These templates can be extended to any arbitrary precisions. We then design runtime and API for dynamically reconfiguring the precision. In terms of performance and energy efficiency, these variable precision template accelerators provides (24 Tops, 545.5 Gops/W),(6.3 Tops, 141 Gops/W), (3.1 Tops, 70 Gops/W) and (1.6 TOPS, 40 Gops/W) for binary, Int4, Int8 and Int16 precisions respectively. To the best of our knowledge, these measured numbers are the state of the art for Arria 10 class of FPGA devices.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要