Dynamic ConvNets on Tiny Devices via Nested Sparsity

IEEE Internet of Things Journal(2023)

引用 2|浏览7
暂无评分
摘要
This work introduces a new training and compression pipeline to build nested sparse convolutional neural networks (ConvNets), a class of dynamic ConvNets suited for inference tasks deployed on resource-constrained devices at the edge of the Internet of Things. A nested sparse ConvNet consists of a single ConvNet architecture, containing $N$ sparse subnetworks with nested weights subsets, like a Matryoshka doll, and can trade accuracy for latency at runtime, using the model sparsity as a dynamic knob. To attain high accuracy at training time, we propose a gradient masking technique that optimally routes the learning signals across the nested weight subsets. To minimize the storage footprint and efficiently process the obtained models at inference time, we introduce a new sparse matrix compression format with dedicated compute kernels that fruitfully exploit the characteristic of the nested weights subsets. Tested on image classification and object detection tasks on an off-the-shelf ARM-M7 microcontroller unit (MCU), nested sparse ConvNets outperform variable-latency solutions naively built assembling single sparse models trained as stand-alone instances, achieving 1) comparable accuracy; 2) remarkable storage savings; and 3) high performance. Moreover, when compared to state-of-the-art dynamic strategies, such as dynamic pruning and layer width scaling, nested sparse ConvNets turn out to be Pareto optimal in the accuracy versus latency space.
更多
查看译文
关键词
Internet of Things,latency-quality scaling,microcontroller units (MCUs),neural network compression
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要