Machine Learning Hardware Design for Efficiency, Flexibility, and Scalability [Feature]

IEEE Circuits and Systems Magazine(2023)

引用 0|浏览2
暂无评分
摘要
The widespread use of deep neural networks (DNNs) and DNN-based machine learning (ML) methods justifies DNN computation as a workload class itself. Beginning with a brief review of DNN workloads and computation, we provide an overview of single instruction multiple data (SIMD) and systolic array architectures. These two basic architectures support the kernel operations for DNN computation, and they form the core of many flexible DNN accelerators. To enable a higher performance and efficiency, sparse DNN hardware can be designed to gain from data sparsity. We present common approaches from compressed storage to processing sparse data to reduce memory and bandwidth usage and improve energy efficiency and performance. To accommodate the fast evolution of new models of larger size and higher complexity, modular chiplet integration can be a promising path to meet the growing needs. We show recent work on homogeneous tiling and heterogeneous integration to scale up and scale out hardware to support larger models of more complex functions.
更多
查看译文
关键词
Surveys,Scalability,Multichip modules,Artificial neural networks,Machine learning,Bandwidth,Tutorials,Design engineering,Hardware design languages,ML hardware,DNN accelerator,sparse DNN architecture,DNN chiplet,heterogeneous integration
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要