DNNVM - End-to-End Compiler Leveraging Operation Fusion on FPGA-based CNN Accelerators.

FPGA(2019)

引用 6|浏览115
暂无评分
摘要
In recent years, Convolutional Neural Network(CNN) is becoming the state-of-the-art method in a wide range of Artificial Intelligence(AI) domains. The increasingly large and complex CNN models are both computation bound and I/O bound. FPGA-based accelerators driven by custom Instruction Set Architecture(ISA) achieve a balance between generality and efficiency, and leave much room for optimization. Operation fusion which fuses adjacent operations without saving intermediate results back to off-chip DDR can greatly alleviate bandwidth pressure, operations can be executed by different computation engines concurrently for latency hiding. To leverage optimizations, especially operation fusion on custom instruction-based accelerators, we propose a full-stack compiler DNNVM(Deep Neural Network Virtual Machine). DNNVM is an integration of optimizers for framework-independent computing graph, loops and data layouts, an assembler, a runtime supporter and a validation environment. DNNVM works in the context of deep learning frameworks and transforms CNN models into a directed acyclic graph, XGraph. After analyzing the interaction among fusion depth, tiling across multiple stages and on-chip memory capacity, DNNVM enumerates all potentially profitable fusion opportunities according to custom fusion templates upon XGraph, by a subgraph isomorphism algorithm. In addition, DNNVM searches for the optimal execution strategies by a heuristic shortest-path algorithm. On Xilinx [email protected], we achieve up to 1.26x speedup than naïve implementations without fusion on GoogLeNet. On Xilinx [email protected], we achieve the throughput of 2.82 TOPs/s for VGG, 1.38 TOPs/s for ResNet50 - he fastest ever reported on comparable FPGAs.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要