Deep Neural Network Acceleration Based on Low-Rank Approximated Channel Pruning

IEEE Transactions on Circuits and Systems I-regular Papers(2020)

引用 32|浏览60
暂无评分
摘要
Acceleration and compression on deep Convolutional Neural Networks (CNNs) have become a critical problem to develop intelligence on resource-constrained devices. Previous channel pruning can be easily deployed and accelerated without specialized hardware and software. However, weight-level redundancy is not well explored in channel pruning, which results in a relatively low compression ratio. In this work, we propose a Low-rank Approximated channel Pruning (LAP) framework to tackle this problem with two targeted steps. First, we utilize low-rank approximation to eliminate the redundancy within filter. This step achieves acceleration, especially in shallow layers, and also converts filters into smaller compact ones. Then, we apply channel pruning on the approximated network in a global way and obtain further benefits, especially in deep layers. In addition, we propose a spectral norm based indicator to coordinate these two steps better. Moreover, inspired by the integral idea adopted in video coding, we propose an evaluator based on Integral of Decay Curve (IDC) to judge the efficiency of various acceleration and compression algorithms. Ablation experiments and IDC evaluator prove that LAP can significantly improve channel pruning. To further demonstrate the hardware compatibility, the network produced by LAP obtains impressive speedup efficiency on the FPGA.
更多
查看译文
关键词
Deep learning,network acceleration,channel pruning,low-rank approximation,efficiency evaluation,hardware resources
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要