Quantization Modes for Neural Network Inference: ASIC Implementation Trade-offs
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN(2023)
摘要
As deep neural networks migrate close to the sensors, accuracy cannot be the single target anymore: inference tasks must also be highly energy efficient. For embedded devices, the power budget for one inference is typically in the range of a few tens of mu W to single-digit mW. We have three levers of action for that: computational workload, number of values to memorize-be they network parameters or intermediate activation results-, and implementation strategy. Given the fact that Application Specific Integrated Circuits are two orders of magnitude more power-efficient than processors for a given technology node, the latter issue is solved using ad-hoc hardware implementations. For the two first former issues, we detail and compare in this work different existing quantization approaches, since reducing the number of bits of the weights and activations reduces computation complexity and storage needs. In addition, we also propose two new modes specifically aiming at low silicon footprint and power optimized hardware implementations that still provide an accuracy in par with existing works. We report the area/power and accuracy trade-offs all theses quantization modes provide when targeting low to ultra-low power devices. The evaluation is done using STMicroelectronics 40nm technology. It shows that the best results vary depending on the dataset and network architecture, which calls for application and quantization aware network architecture search.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要