TT@CIM: A Tensor-Train In-Memory-Computing Processor Using Bit-Level-Sparsity Optimization and Variable Precision Quantization

IEEE Journal of Solid-State Circuits(2023)

引用 3|浏览65
暂无评分
摘要
Computing-in-memory (CIM) is an attractive approach for energy-efficient deep neural network (DNN) processing, especially for low-power edge devices. However, today’s typical DNNs usually exceed CIM-static random access memory (SRAM) capacity. The introduced off-chip communication covers up the benefits of CIM technique, meaning that CIM processors still encounter the memory bottleneck. To eliminate this bottleneck, we propose a CIM processor, called TT@CIM, which applies the tensor-train decomposition (TTD) method to compress the entire DNN to fit within CIM-SRAM. However, the cost of storage reduction by TTD is to introduce multiple serial small-size matrix multiplications, resulting in massive inefficient multiply-and-accumulate (MAC) and quantization operations (QuantOps). To achieve high energy efficiency, three optimization techniques are proposed in TT@CIM. First, TTD-CIM-matched dataflow is proposed to maximize CIM utilization and minimize additional MAC operations. Second, a bit-level-sparsity-optimized CIM macro with high bit-level-sparsity encoding scheme is designed to reduce the power consumption of one MAC operation. Third, a variable precision quantization method and a lookup table-based quantization unit are presented to improve the performance and energy efficiency of QuantOp. Fabricated in 28-nm CMOS and tested on 4/8-bit decomposed DNNs, TT@CIM achieves 5.99-to-691.13-TOPS/W peak energy efficiency depending on the operating voltage.
更多
查看译文
关键词
Computing-in-memory (CIM),deep neural network (DNN) processor,quantization,sparsity,tensor-train decomposition (TTD),weight compression
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要