Efficient Hardware/Software Implementation for GoogLeNet Using Xilinx SDSoC

2022 4th Novel Intelligent and Leading Emerging Sciences Conference (NILES)(2022)

引用 0|浏览4
暂无评分
摘要
Convolution Neural Networks (CNNs) are recently deployed in many applications. The huge number of network parameters and the intensive operations in CNN models make it challenging to achieve desired performance levels using general-purpose processors. Therefore, different hardware accelerators for deep CNNs have recently been developed to improve throughput. Field Programmable Gate Array (FPGA)-based accelerators are mostly used. Different approaches such as Register Transfer Level (RTL) or High-Level Synthesis (HLS) are used for implementing FPGA-based accelerators. In this work, the Hardware/Software (HW/SW) Co-design Partitioning methodology is introduced as a solution to speed up the design time and shorten the time to market for CNN implementations. This work focuses on implementing the GoogLeNet CNN network. Xilinx Software-Defined System on Chip (SDSoC) tool is used to achieve the HW/SW Co-design by moving the most computationally intensive components to run on FPGA while keeping the rest of the network running on an embedded Central Processing Unit (CPU). Experiments are evaluated on Xilinx Zynq UltraScale+ MPSoC ZCU104 Evaluation Kit. Experimental results show a speedup of 48x for 32-bit float data precision, with 3.8 watts for total on-chip power consumption. Also, the proposed accelerator utilizes 40% fewer hardware resources compared with the corresponding RTL accelerator.
更多
查看译文
关键词
CNNs,FPGA,GoogLeNet,Hardware Accelerators,HLS,Loop Tiling,Loop Transformation and SDSoC
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要