EdgeL^3: Compressing L^3-Net for Mote Scale Urban Noise Monitoring

2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)(2019)

引用 14|浏览47
暂无评分
摘要
Urban noise sensing in deeply embedded devices at the edge of the Internet of Things (IoT) is challenging not only because of the lack of sufficiently labeled training data but also because device resources are quite limited. Look, Listen, and Learn (L 3 ), a recently proposed state-of-the-art transfer learning technique, mitigates the first challenge by training self-supervised deep audio embeddings through binary Audio-Visual Correspondence (AVC), and the resulting embeddings can be used to train a variety of downstream audio classification tasks. However, with close to 4.7 million parameters, the multi-layer L 3 -Net CNN is still prohibitively expensive to be run on small edge devices, such as "motes" that use a single microcontroller and limited memory to achieve long-lived self-powered operation. In this paper, we comprehensively explore the feasibility of compressing the L 3 -Net for mote-scale inference. We use pruning, ablation, and knowledge distillation techniques to show that the originally proposed L 3 -Net architecture is substantially overparameterized, not only for AVC but for the target task of sound classification as evaluated on two popular downstream datasets. Our findings demonstrate the value of fine-tuning and knowledge distillation in regaining the performance lost through aggressive compression strategies. Finally, we present EdgeL 3 , the first L 3 -Net reference model compressed by 1-2 orders of magnitude for real-time urban noise monitoring on resource-constrained edge devices, that can fit in just 0.4 MB of memory through half-precision floating point representation.
更多
查看译文
关键词
edge network,pruning,convolutional neural nets,deep learning,audio embedding,transfer learning,finetuning,knowledge distillation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要