FPGA Design of Transposed Convolutions for Deep Learning Using High-Level Synthesis

JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY(2023)

引用 0|浏览0
暂无评分
摘要
Deep Learning (DL) is pervasive across a wide variety of domains. Convolutional Neural Networks (CNNs) are often used for image processing DL applications. Modern CNN models are growing to meet the needs of more sophisticated tasks, e.g. using Transposed Convolutions (TCONVs) for image decompression and image generation. Such state-of-the-art DL models often target GPU-based high-performance architectures, due to the high computational and hardware resource needs of TCONV layers. To avoid prohibitive GPU energy costs, CNNs are increasingly deployed to decentralized embedded autonomous devices, such as Field Programmable Gate Arrays (FPGAs). However, this poses challenges for designing efficient hardware implementations of TCONV layers. This paper presents a parameterized design and implementation of a new TCONV module, which is synthesizable onto FPGAs. It is implemented using the High-Level Synthesis (HLS), through a C++ template to parameterize its functional and non-functional properties. These parameters allow kernel sizes, image sizes, quantization and parallelism to be varied by users. With a systematic exploration in this design space, we find an optimal instance of this TCONV module that achieves 6.25 Giga Outputs per Second ( Gout/s ) using just 1.53 W of power. We then use our TCONV layer in two neural networks for image decompression and image generation. Image decompression achieves a speed throughput of more than 30K frames-per-second ( fps ) using only the 16% of resources on average, image generation achieves an energy efficiency of 324 fps /W and outperforms comparable state-of-the-art models by at least 7.3×.
更多
查看译文
关键词
Transposed Convolution,Deep Learning,FPGA,High-Level Synthesis,Quantization,Parallelism
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要