tubGEMM: Energy-Efficient and Sparsity-Effective Temporal-Unary-Binary Based Matrix Multiply Unit

2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)(2023)

引用 0|浏览4
暂无评分
摘要
General Matrix Multiplication (GEMM) is a ubiquitous compute kernel in deep learning (DL). To support energy-efficient edge-native processing, new GEMM hardware units have been proposed that operate on unary encoded bitstreams using much simpler hardware. Most unary approaches thus far focus on rate-based unary encoding of values and perform stochastic approximate computation. This work presents tubGEMM, a novel matrix-multiply unit design that employs hybrid temporal-unary and binary (tub) encoding and performs exact (not approximate) GEMM. It intrinsically exploits dynamic value sparsity to improve energy efficiency. Compared to the current best unary design uGEMM, tubGEMM significantly reduces area, power, and energy by 89%, 87%, and 50% respectively. A tubGEMM design performing 128x128 matrix multiply on 8-bit integers, in commercial TSMC N5 (5nm) process node, consumes just 0.22 m$\mathrm{m}^{2}$ die area, 417.72 mW power, and 8.86 $\mu$J energy, assuming no sparsity. Typical sparsity in DL workloads (MobileNetv2, ResNet50) reduces energy by more than 3x, and lowering precision to 4 and 2 bits further reduces it by 24x and 104x respectively.
更多
查看译文
关键词
GEMM,temporal unary compute,sparsity
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络