Quantization Modes for Neural Network Inference: ASIC Implementation Trade-offs

Nathan Bain,Roberto Guizzetti, Emilien Taly, Ali Oudrhiri, Bruno Paille,Pascal Urard,Frederic Petrot

2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN（2023）

引用 0|浏览4

暂无评分

摘要

As deep neural networks migrate close to the sensors, accuracy cannot be the single target anymore: inference tasks must also be highly energy efficient. For embedded devices, the power budget for one inference is typically in the range of a few tens of mu W to single-digit mW. We have three levers of action for that: computational workload, number of values to memorize-be they network parameters or intermediate activation results-, and implementation strategy. Given the fact that Application Specific Integrated Circuits are two orders of magnitude more power-efficient than processors for a given technology node, the latter issue is solved using ad-hoc hardware implementations. For the two first former issues, we detail and compare in this work different existing quantization approaches, since reducing the number of bits of the weights and activations reduces computation complexity and storage needs. In addition, we also propose two new modes specifically aiming at low silicon footprint and power optimized hardware implementations that still provide an accuracy in par with existing works. We report the area/power and accuracy trade-offs all theses quantization modes provide when targeting low to ultra-low power devices. The evaluation is done using STMicroelectronics 40nm technology. It shows that the best results vary depending on the dataset and network architecture, which calls for application and quantization aware network architecture search.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要