Compressing RNNs to Kilobyte Budget for IoT Devices Using Kronecker Products

ACM Journal on Emerging Technologies in Computing Systems(2021)

引用 2|浏览25
AbstractMicro-controllers (MCUs) make up most of the processors in the world with widespread applicability from automobile to medical devices. The Internet of Things promises to enable these resource-constrained MCUs with machine learning algorithms to provide always-on intelligence. Many Internet of Things applications consume time-series data that are naturally suitable for recurrent neural networks (RNNs) like LSTMs and GRUs. However, RNNs can be large and difficult to deploy on these devices, as they have few kilobytes of memory. As a result, there is a need for compression techniques that can significantly compress RNNs without negatively impacting task accuracy. This article introduces a method to compress RNNs for resource-constrained environments using the Kronecker product (KP). KPs can compress RNN layers by 16× to 38× with minimal accuracy loss. By quantizing the resulting models to 8 bits, we further push the compression factor to 50×. We compare KP with other state-of-the-art compression techniques across seven benchmarks spanning five different applications and show that KP can beat the task accuracy achieved by other techniques by a large margin while simultaneously improving the inference runtime. Sometimes the KP compression mechanism can introduce an accuracy loss. We develop a hybrid KP approach to mitigate this. Our hybrid KP algorithm provides fine-grained control over the compression ratio, enabling us to regain accuracy lost during compression by adding a small number of model parameters.
Neural networks, micro-controllers, matrix decomposition, Kronecker products, model compression, IoT
AI 理解论文