L-MPC: A LUT based MuIti-LeveI Prediction-Correction Architecture for Accelerating Binary-Weight Hourglass Network

2019 56th ACM/IEEE Design Automation Conference (DAC)(2019)

引用 0|浏览1
暂无评分
摘要
A binary-weight hourglass network (B-HG) accelerator for landmark detection, built on the proposed look-up-table (LUT) based multi-level prediction-correction approach, is enabled for high-speed and energy-efficient processing on IoT edge devices. First, LUT with a unified mode is adopted to support convolutional neural network with fully variable weight bit precision to minimize operations of B-HG, which achieves $1.33\times-1.50\times$ speedup on multi-bit weight CNN relative to the similar solution. Second, multi-level prediction-correction model is proposed to achieve computational-efficient convolution with adaptive precision. The operations saved can be increase by about 30% than the two-stage model. Besides, nearly 77.4% of the operations in B-HG can be saved by using the combination of these two methods, yielding a 2.3× inference speedup. Third, block computing based pipeline is designed to improve the residual block deficiency in B-HG. It can not only reduce about 66.2% off-chip memory access than the baseline, but also save 60% and 31% on-chip memory space and access compared to the similar fused-layer accelerator. The proposed B-HG accelerator achieves 450 fps at 500MHz based on the simulation in TSMC 28 nm process. Meanwhile, the power efficiency is up to 8.5 TOPS/W, which is two orders of magnitude higher than the dedicated face landmark detection accelerator.
更多
查看译文
关键词
LUT,binary-weight hourglass network accelerator,look-up-table based multilevel prediction-correction approach,energy-efficient processing,IoT edge devices,convolutional neural network,fully variable weight bit precision,multibit weight CNN,multilevel prediction-correction model,adaptive precision,block computing based pipeline,on-chip memory space,B-HG accelerator,face landmark detection accelerator,fused-layer accelerator,muitilevel prediction-correction architecture,L-MPC,high-speed processing,unified mode,two-stage model,residual block deficiency,TSMC process,power efficiency,frequency 500.0 MHz,size 28.0 nm
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要