Hardware Accelerator Design for Sparse DNN Inference and Training: A Tutorial.

IEEE Trans. Circuits Syst. II Express Briefs(2024)

引用 0|浏览0
暂无评分
摘要
Deep neural networks (DNNs) are widely used in many fields, such as artificial intelligence generated content (AIGC) and robotics. To efficiently support these tasks, the model pruning technique is developed to compress the computational and memory-intensive DNNs. However, directly executing these sparse models on a common hardware accelerator can cause significant under-utilization, since invalid data resulting from the sparse patterns leads to unnecessary computations and irregular memory accesses. This brief analyzes the critical issues in accelerating sparse models, and provides an overview of typical hardware designs for various sparse DNNs, such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), generative adversarial networks (GANs), and Transformers. Following the overview, we give a practical guideline of designing efficient accelerators for sparse DNNs with qualitative metrics to evaluate hardware overhead under different cases. In addition, we highlight potential opportunities in terms of hardware/software/algorithm co-optimizations from the perspective of sparse DNN implementation, and provide insights into recent design trends for the efficient implementation of transformers with sparse attention, which facilitates large language model (LLM) deployments with high throughput and energy efficiency.
更多
查看译文
关键词
Hardware Acceleration,Sparsity,CNN,Transformer,Tutorial,Deep Learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要