Compression and Speed-up of Convolutional Neural Networks Through Dimensionality Reduction for Efficient Inference on Embedded Multiprocessor

Lucas Fernández Brillet, Nicolas Leclaire,Stéphane Mancini,Marina Nicolas,Sébastien Cleyet-Merle, Jean-Paul Henriques, Claude Delnondedieu

Journal of Signal Processing Systems(2021)

引用 0|浏览2
暂无评分
摘要
Computational complexity of state of the art Convolutional Neural Networks (CNNs) makes their integration in embedded systems with low power consumption requirements a challenging task. This requires the joint design and adaptation of hardware and algorithms. In this paper, we propose a new general CNN compression method to reduce both the number of parameters and operations. To solve this, we introduce a new Principal Component Analysis (PCA) based compression, which relies on an optimal transformation (in the mean squared error sense) of the filters on each layer into a new representation space where convolutions are then applied. Compression is achieved by dimensioning this new representation space, with an arbitrarily controlled accuracy degradation of the new CNN. PCA compression is evaluated on multiple networks and datasets from the state of the art and applied to a binary face classification network. To show the versatility of the method and its usefulness to adapt a CNN to a hardware computing system, the compressed face classification network is implemented and evaluated on a custom embedded multiprocessor. Results show that for example, an overall compression rates of 2x can be achieved on a compact ResNet-32 model on the CIFAR-10 dataset, with only a negligible loss of 2% of the network accuracy, while up to 11x compression rates can be achieved on specific layers with negligible accuracy loss.
更多
查看译文
关键词
CNN, Compression, Speed-up, PCA, Face detection, Embedded, Multicore, Accelerator
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要