NAF: Deeper Network/Accelerator Co-Exploration for Customizing CNNs on FPGA.

DATE(2023)

引用 0|浏览117
暂无评分
摘要
Recently, algorithm and hardware co-design for neural networks (NNs) has become the key to obtaining highquality solutions. However, prior works lack consideration of the underlying hardware and thus suffer from a severely unbalanced neural architecture and hardware architecture search (NA-HAS) space on FPGAs, failing to unleash the performance potential. Nevertheless, a deeper joint search leads to a larger (multiplicative) search space, highly challenging the search. To this end, we propose an efficient differentiable search framework NAF, which jointly searches the networks (e.g., operations and bitwidths) and accelerators (e.g., heterogeneous multicores and mappings) under a balanced NA-HAS space. Concretely, we design a coarse-grained hardware-friendly quantization algorithm and integrate it at a block granularity into the co-search process. Meanwhile, we design a highly optimized block processing unit (BPU) with key dataflow configurable. Afterward, a dynamic hardware generation algorithm based on modeling and heuristic rules is designed to perform the critical HAS and fast generate hardware feedback. Experimental results show that compared with the previous state-of-the-art (SOTA) co-design works, NAF improves the throughput by 1.99x similar to 6.84x on Xilinx ZCU102 and energy efficiency by 17%similar to 88% under similar accuracy on the ImageNet dataset.
更多
查看译文
关键词
BPU,CNN,co-search process,coarse-grained hardware-friendly quantization algorithm,dynamic hardware generation algorithm,energy efficiency,FPGA,hardware co-design,hardware feedback,heuristic rules,highly optimized block processing unit,ImageNet dataset,NA-HAS space,NAF differentiable search framework,neural architecture and hardware architecture search,neural networks,Xilinx ZCU102
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要