A 65nm 4kb Algorithm-Dependent Computing-In-Memory Sram Unit-Macro With 2.3ns And 55.8tops/W Fully Parallel Product-Sum Operation For Binary Dnn Edge Processors

2018 IEEE INTERNATIONAL SOLID-STATE CIRCUITS CONFERENCE - (ISSCC)(2018)

引用 241|浏览99
暂无评分
摘要
For deep-neural-network (DNN) processors [1-4], the product-sum (PS) operation predominates the computational workload for both convolution (CNVL) and fully-connect (FCNL) neural-network (NN) layers. This hinders the adoption of DNN processors to on the edge artificial-intelligence (AI) devices, which require low-power, low-cost and fast inference. Binary DNNs [5-6] are used to reduce computation and hardware costs for AI edge devices; however, a memory bottleneck still remains. In Fig. 31.5.1 conventional PE arrays exploit parallelized computation, but suffer from inefficient single-row SRAM access to weights and intermediate data. Computing-in-memory (CIM) improves efficiency by enabling parallel computing, reducing memory accesses, and suppressing intermediate data. Nonetheless, three critical challenges remain (Fig. 31.5.2), particularly for FCNL. We overcome these problems by co-optimizing the circuits and the system. Recently, researches have been focusing on XNOR based binary-DNN structures [6]. Although they achieve a slightly higher accuracy, than other binary structures, they require a significant hardware cost (i.e. 8T-12T SRAM) to implement a CIM system. To further reduce the hardware cost, by using 6T SRAM to implement a CIM system, we employ binary DNN with 0/1-neuron and ±1-weight that was proposed in [7]. We implemented a 65nm 4Kb algorithm-dependent CIM-SRAM unit-macro and in-house binary DNN structure (focusing on FCNL with a simplified PE array), for cost-aware DNN AI edge processors. This resulted in the first binary-based CIM-SRAM macro with the fastest (2.3ns) PS operation, and the highest energy-efficiency (55.8TOPS/W) among reported CIM macros [3-4].
更多
查看译文
关键词
FCNL NN layer,convolution neural-network,parallelized computation,intermediate data suppression,XNOR based binary-DNN structures,hardware cost reduction,fully-connect neural-network layers,algorithm-dependent CIM-SRAM unit,FCNL neural-network layers,fully parallel product-sum operation,algorithm-dependent computing-in-memory SRAM unit-macro,hardware cost,PE arrays,PS operation,binary-based CIM-SRAM macro,cost-aware DNN AI edge processors,in-house binary DNN structure,6T SRAM,CIM system,memory accesses,parallel computing,intermediate data,single-row SRAM access,memory bottleneck,AI edge devices,edge artificial-intelligence devices,computational workload,deep-neural-network processors,binary DNN edge processors,product-sum operation,size 65.0 nm,time 2.3 ns,storage capacity 4 Kbit
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要