Unpu: A 50.6tops/W Unified Deep Neural Network Accelerator With 1b-To-16b Fully-Variable Weight Bit-Precision

2018 IEEE INTERNATIONAL SOLID-STATE CIRCUITS CONFERENCE - (ISSCC)(2018)

引用 283|浏览227
暂无评分
摘要
Deep neural network (DNN) accelerators [1-3] have been proposed to accelerate deep learning algorithms from face recognition to emotion recognition in mobile or embedded environments [3]. However, most works accelerate only the convolutional layers (CLs) or fully-connected layers (FCLs), and different DNNs, such as those containing recurrent layers (RLs) (useful for emotion recognition) have not been supported in hardware. A combined CNN-RNN accelerator [1], separately optimizing the computation-dominant CLs, and memory-dominant RLs or FCLs, was reported to increase overall performance, however, the number of processing elements (PEs) for CLs and RLs was limited by their area and consequently, performance was suboptimal in scenarios requiring only CLs or only RLs. Although the PEs for RLs can be reconfigured into PEs for CLs or vice versa, only a partial reconfiguration was possible resulting in marginal performance improvement. Moreover, previous works [1-2] supported a limited set of weight bit precisions, such as either 4b or 8b or 16b. However, lower weight bit-precisions can achieve better throughput and higher energy efficiency, and the optimal bit-precision can be varied according to different accuracy/performance requirements. Therefore, a unified DNN accelerator with fully-variable weight bit-precision is required for the energy-optimal operation of DNNs within a mobile environment.
更多
查看译文
关键词
fully-variable weight bit-precision,deep neural network accelerators,deep learning algorithms,face recognition,emotion recognition,mobile environments,embedded environments,FCLs,containing recurrent layers,combined CNN-RNN accelerator,marginal performance improvement,weight bit precisions,lower weight bit-precisions,optimal bit-precision,unified DNN accelerator,mobile environment,DNN,TOPS-W unified deep neural network accelerator,computation-dominant CL,PE,memory-dominant RL
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要