TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training

2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)(2020)

引用 49|浏览55
暂无评分
摘要
TensorDash is a hardware-based technique that enables data-parallel MAC units to take advantage of sparsity in their input operand streams. When used to compose a hardware accelerator for deep learning, TensorDash can speedup the training process while also increasing energy efficiency. TensorDash combines a low-cost sparse input operand interconnect with an area-efficient hardware scheduler. The scheduler can effectively extract sparsity in the activations, the weights, and the gradients. Over a wide set of state-of-the-art models covering various applications, TensorDash accelerates the training process by 1.95× while being 1.5× more energy efficient when incorporated on top of a Tensorcore-based accelerator at less than 5% area overhead. TensorDash is datatype agnostic and we demonstrate it with IEEE standard mixed-precision floating-point units and a popular optimized for machine learning floating-point format (BFloat16).
更多
查看译文
关键词
TensorDash,accelerate deep neural network training,hardware-based technique,data-parallel MAC units,input operand streams,hardware accelerator,deep learning,energy efficiency,low-cost sparse input operand interconnect,area-efficient hardware scheduler,Tensorcore-based accelerator
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要