A 28-nm 64-kb 31.6-TFLOPS/W Digital-Domain Floating-Point-Computing-Unit and Double-Bit 6T-SRAM Computing-in-Memory Macro for Floating-Point CNNs

An Guo,Xi Chen, Fangyuan Dong, Xingyu Pu, Dongqi Li, Jingmin Zhang, Xueshan Dong, Hui Gao, Yiran Zhang,Bo Wang,Jun Yang,Xin Si

IEEE Journal of Solid-State Circuits(2024)

引用 0|浏览0
暂无评分
摘要
With the rapid advancement of artificial intelligence (AI), computing-in-memory (CIM) structure is proposed to improve energy efficiency (EF). However, previous CIMs often rely on INT8 data types, which pose challenges when addressing more complex networks, larger datasets, and increasingly intricate tasks. This work presents a double-bit 6T static random-access memory (SRAM)-based floating-point CIM macro using: 1) a cell array with double-bitcells (DBcells) and floating-point computing units (FCUs) to improve throughput without the sacrifice of inference accuracy; 2) an FCU with high-bit full-precision multiply cell (HFMC) and low-bit approximate-calculation multiply cell (LAMC) to reduce internal bandwidth and area cost; 3) a CIM macro architecture with FP processing circuits to support both floating-point MAC (FP-MAC) and integer (INT)-multiplication and accumulation (MAC); 4) a new ShareFloatv2 data type to map floating point in CIM array; and 5) a lookup table (LUT)-based Tensorflow training method to improve inference accuracy. A fabricated 28-nm 64-kb digital-domain SRAM-CIM macro achieved the best EF (31.6 TFLOPS/W) and the highest area efficiency (2.05 TFLOPS/mm $^{2})$ for FP-MAC with Brain Float16 (BF16) IN/W/OUT on three AI tasks: classification@CIFAR100, detection@COCO, and segmentation@VOC2012.
更多
查看译文
关键词
Artificial intelligence (AI),computing-in-memory (CIM),double-bit 6T static random-access memory (SRAM),floating multiplication and accumulation (MAC) operation,ShareFloat
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要