Vyasa: A High-Performance Vectorizing Compiler for Tensor Convolutions on the Xilinx AI Engine
2020 IEEE High Performance Extreme Computing Conference (HPEC)(2020)
摘要
Xilinx's AI Engine is a recent industry example of energy-efficient vector processing that includes novel support for 2D SIMD datapaths and shuffle interconnection network. The current approach to programming the AI Engine relies on a C/C++ API for vector intrinsics. While an advance over assembly-level programming, it requires the programmer to specify a number of low-level operations based on detailed knowledge of the hardware. To address these challenges, we introduce Vyasa, a new programming system that extends the Halide DSL compiler to automatically generate code for the AI Engine. We evaluated Vyasa on 36 CONV2D workloads, and achieved geometric means of 7.6 and 24.2 MACs/cycle for 32-bit and 16-bit operands (which represent 95.9% and 75.6% of the peak performance respectively).
更多查看译文
关键词
36 CONV2D workloads,Vyasa,high-performance vectorizing compiler,tensor convolutions,xilinx AI Engine,recent industry example,energy-efficient vector processing,2D SIMD datapaths,shuffle interconnection network,vector intrinsics,assembly-level programming,low-level operations,programming system,Halide DSL compiler
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络