Mixed-Signal Charge-Domain Acceleration of Deep Neural Networks through Interleaved Bit-Partitioned Arithmetic

Soroush Ghodrati,Hardik Sharma,Sean Kinzer,Amir Yazdanbakhsh,Jongse Park,Nam Sung Kim,Doug Burger,Hadi Esmaeilzadeh

PACT '20: International Conference on Parallel Architectures and Compilation Techniques Virtual Event GA USA October, 2020（2020）

引用 7|浏览3

暂无评分

摘要

Albeit low-power, mixed-signal circuitry suffers from significant overhead of Analog to Digital (A/D) conversion, limited range for information encoding, and susceptibility to noise. This paper aims to address these challenges by offering and leveraging the following mathematical insight regarding vector dot-product-the basic operator in Deep Neural Networks (DNNs). This operator can be reformulated as a wide regrouping of spatially parallel low-bitwidth calculations that are interleaved across the bit partitions of multiple elements of the vectors. As such, the computational building block of our accelerator becomes a wide bit-interleaved analog vector unit comprising a collection of low-bitwidth multiply-accumulate modules that operate in the analog domain and share a single A/D converter (ADC). This bit-partitioning results in a lower-resolution ADC while the wide regrouping alleviates the need for A/D conversion per operation, amortizing its cost across multiple bit-partitions of the vector elements. Moreover, the low-bitwidth modules require smaller encoding range and also provide larger margins for noise mitigation. We also utilize the switched-capacitor design for our bit-level reformulation of DNN operations. The proposed switched-capacitor circuitry performs the regrouped multiplications in the charge domain and accumulates the results of the group in its capacitors over multiple cycles. The capacitive accumulation combined with wide bit-partitioned regrouping reduces the rate of A/D conversions, further improving the overall efficiency of the design. With such mathematical reformulation and its switched-capacitor implementation, we define one possible 3D-stacked microarchitecture, dubbed BIHIWE1, that leverages clustering and hierarchical design to best utilize power-efficiency of the mixed-signal domain and 3D stacking. We also build models for noise, computational non-idealities, and variations. For ten DNN benchmarks, BIHIWE delivers 5.5 xspeedup over a leading purely-digital 3D-stacked accelerator TETRIS, with a mere of less than 0.5% accuracy loss achieved by careful treatment of noise, computation error, and various forms of variation. Compared to RTX 2080 TI with tensor cores and Titan Xp GPUs, all with 8-bit execution, BIHIWE offers 35.4xand 70.1xhigher Performance-per-Watt, respectively. Relative to the mixed-signal RedEye, ISAAC, and PipeLayer, BIHIWE offers 5.5x, 3.6x, and 9.6x improvement in Performance-per-Watt respectively. The results suggest that BIHIWE is an effective initial step in a road that combines mathematics, circuits, and architecture.

查看译文

关键词

Accelerators,Deep Neural Networks,DNN,DNN Acceleration,Analog/Mixed-Signal Computing,Mixed-Signal Acceleration,Bit-Partitioning,Spatial Bit-Level Regrouping,Analog Error Modeling

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要