On the Complexity Reduction of Dense Layers from O(N2) to O(NlogN) with Cyclic Sparsely Connected Layers

Proceedings of the 56th Annual Design Automation Conference 2019(2019)

引用 23|浏览34
暂无评分
摘要
In deep neural networks (DNNs), model size is an important factor affecting performance, energy efficiency and scalability. Recent works on weight pruning have shown significant reduction in model size at the expense of irregularity in the DNN architecture, which necessitates additional indexing memory to address non-zero weights, thereby increasing chip size, energy consumption and delays. In this paper, we propose cyclic sparsely connected (CSC) layers, with a memory/computation complexity of O(NlogN), that can be used as an overlay for fully connected (FC) layers whose number of parameters, O(N2), can dominate the parameters of the entire DNN model. The CSC layers are composed of a few sequential layers, referred to as support layers, which result in full connectivity between the Inputs and Outputs of each CSC layer. We introduce an algorithm to train models with FC layers replaced with CSC layers in a bottom-up approach by incrementally increasing the CSC layers characteristics such as connectivity and number of synapses, to achieve the desired accuracy given a compression rate. One advantage of the CSC layers is that there will be no requirement for indexing the non-zero weights. Our experimental results using AlexNet on ImageNet and LeNet300100 on MNIST indicate that by substituting FC layers with CSC layers, we can achieve 10× to 46× compression within a margin of 2% accuracy loss, which is comparable to non-structural pruning methods. A scalable parallel hardware architecture to implement CSC layers, and an equivalent scalable parallel architecture to efficiently implement non-structurally pruned FC layers are designed and fully placed and routed on Artix-7 FPGA and ASIC 65nm CMOS technology for LeNet300100 model. The results indicate that the proposed CSC hardware outperforms the conventional non-structurally pruned architecture with an equal compression rate by ~2× in power, energy, area and resource utilization when running at the same frequency.
更多
查看译文
关键词
FC layers,CSC layer,cyclic sparsely connected layers,sequential layers,fully connected layers,dense layer complexity reduction,deep neural networks,DNN architecture,scalable parallel hardware architecture,nonstructurally pruned FC layers
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要