A 40-nm 91-mW, 90-fps Learning-Based Full HD Super-Resolution Accelerator
IEEE Journal of Solid-State Circuits(2023)
摘要
Super-resolution has been utilized in a plenty of applications to provide better visual experience. To meet the high-throughput and low-power needs, some dedicated accelerators for super-resolution have been proposed. Neural-network (NN)-based super-resolution accelerators achieve impressive restoration performance, but the high-computational complexity does not allow a high throughput for video streaming. This work presents a super-resolution accelerator that implements the rapid and accurate image super-resolution (RAISR) algorithm for reconstructing super-resolution images. The utilization of the low-resolution (LR) upscaler is increased by 50% by the proposed memory scheduling scheme. Kernel compression is utilized to reduce the overall on-chip memory by 72%. A patch reuse scheme achieves a 91% reduction in external memory access times compared to the direct-mapped design. The architecture is flexible to reconstruct full HD images with a variety of upscaling factors (
$2\times $
,
$3\times $
,
$4\times $
). Fabricated in a 40-nm CMOS technology, the proposed super-resolution accelerator integrates 3.11-M gates in a core area of 3.33 mm2. The chip is able to deliver a throughput of 90 frame/s (fps) for all supported upscaling factors and dissipates 91 mW at 200 MHz. Compared with the state-of-the-art designs, this work achieves a 5.4-to-
$28.4\times $
higher normalized throughput with 5.1-to-
$36\times $
lower normalized energy dissipation.
更多查看译文
关键词
CMOS integrated circuits,energy-efficient architecture,hardware accelerator,machine learning,super-resolution
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要