CSCMAC - Cyclic Sparsely Connected Neural Network Manycore Accelerator

2020 21st International Symposium on Quality Electronic Design (ISQED)(2020)

引用 1|浏览38
暂无评分
摘要
This paper presents an energy-efficient, domain-specific manycore accelerator also referred to as the “CSCMAC” - Cyclic Sparsely Connected Neural Network Manycore Accelerator, which effectively maps and executes deep neural networks (DNNs) compressed with cyclic sparsely connected (CSC) architectures. CSC layers are architectures that structurally compress and sparsify DNNs, which can reduce the memory footprint of fully connected (FC) layers from $O(N^{2})$ to $O(N\log N)$ with respect to layers nodes, and is shown to be hardware implementable-friendly. We implement CSC layers for inference on a manycore unit, take advantage of their cyclic architecture, and show that their implementation in software even for a parallel-computing processor is affable. To further take advantage of their implementation simplicity, we propose customized instructions for the manycore that fuse frequently used sequences of machine codes and evaluate the optimization gained by the customization. Our experimental results using a LeNet300100 on MNIST and a Multi-Layer Perceptron (MLP) on Physical Activity Monitoring indicate that by replacing FC layers with CSC layers, we can achieve $46\times$ and $6\times$ compression respectively within a margin of 2% accuracy loss. A 64-cluster architecture of the CSCMAC is fully placed and routed using $65\mathrm{nm}$ , TSMC CMOS technology. The layout of each cluster occupies an area of $0.73\ mm^{2}$ and consumes $230.2 \mathrm{mW}$ power at 980 MHz clock frequency. Our proposed CSCMAC achieves $1.48\times$ higher throughput and $1.49\times$ lower energy compared to its equivalent predecessor manycore (PENC). Also, the CSCMAC achieves $85\times$ higher throughput and consumes $66.4\times$ lower energy compared to CPU implementation of the NVIDIA Jetson TX2 platform.
更多
查看译文
关键词
Programmable Manycore accelerator,Model Compression,Complexity Reduction,Cyclic Sparsely Connected layers
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要