XVDPU: A High Performance CNN Accelerator on Versal Platform Powered by AI EngineJust Accepted

ACM Transactions on Reconfigurable Technology and Systems(2022)

引用 0|浏览3
暂无评分
摘要
Nowadays, convolution neural networks (CNNs) are widely used in computer vision applications. However, the trends of higher accuracy and higher resolution generate larger networks. The requirements of computation or I/O are the key bottlenecks. In this paper, we propose XVDPU: the AI-Engine (AIE)-based CNN accelerator on Versal chips to meet heavy computation requirements. To resolve IO bottleneck, we adopt several techniques to improve data-reuse and reduce I/O requirements. An Arithmetic Logic Unit (ALU) is further proposed which can better balance resource utilization, new feature support, and efficiency of the whole system. We have successfully deployed more than 100 CNN models with our accelerator. Our experimental results show that the 96-AIE-core implementation can achieve 1653 frames per second (FPS) for ResNet50 on VCK190, which is 9.8 × faster than the design on ZCU102 running at 168.5 FPS. The 256-AIE-core implementation can further achieve 4050 FPS. We propose a tilling strategy to achieve feature-map-stationary (FMS) for high-definition CNN (HD-CNN) with the accelerator, achieving 3.8 × FPS improvement on Residual Channel Attention Network (RCAN) and 3.1 × on Super-Efficient Super-Resolution (SESR). This accelerator can also solve the 3D convolution task in disparity estimation, achieving end-to-end (E2E) performance of 10.1FPS with all the optimizations.
更多
查看译文
关键词
ACAP,Acceleration,AI Engine,ALU engine,CNN,FPGA,Hardware Heterogeneous architecture,Versal
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要