TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training

Mostafa Mahmoud,Isak Edo,Ali Hadi Zadeh,Omar Mohamed Awad,Gennady Pekhimenko,Jorge Albericio,Andreas Moshovos

2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)（2020）

引用 49|浏览55

暂无评分

摘要

TensorDash is a hardware-based technique that enables data-parallel MAC units to take advantage of sparsity in their input operand streams. When used to compose a hardware accelerator for deep learning, TensorDash can speedup the training process while also increasing energy efficiency. TensorDash combines a low-cost sparse input operand interconnect with an area-efficient hardware scheduler. The scheduler can effectively extract sparsity in the activations, the weights, and the gradients. Over a wide set of state-of-the-art models covering various applications, TensorDash accelerates the training process by 1.95× while being 1.5× more energy efficient when incorporated on top of a Tensorcore-based accelerator at less than 5% area overhead. TensorDash is datatype agnostic and we demonstrate it with IEEE standard mixed-precision floating-point units and a popular optimized for machine learning floating-point format (BFloat16).

查看译文

关键词

TensorDash,accelerate deep neural network training,hardware-based technique,data-parallel MAC units,input operand streams,hardware accelerator,deep learning,energy efficiency,low-cost sparse input operand interconnect,area-efficient hardware scheduler,Tensorcore-based accelerator

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要