Quantization: how far should we go?

2022 25th Euromicro Conference on Digital System Design (DSD)(2022)

引用 1|浏览1
暂无评分
摘要
Machine learning, and specifically Deep Neural Networks (DNNs) impact all parts of daily life. Although DNNs can be large and compute intensive, requiring processing on big servers (like in the cloud), we see a move of DNNs into loT-edge based systems, adding intelligence to these systems. These systems are often energy constrained and too small for satisfying the huge DNN computation and memory demands. DNN model quantization may come to the rescue. Instead of using 32-bit floating point numbers, much smaller formats can be used, down to 1-bit binary numbers. Although this largely may solve the compute and memory problems, it comes with a huge price, model accuracy reduction. This problem spawned a lot of research into model repair methods, especially for binary neural networks. Heavy quantization triggers a lot of debate; we even see some movements of going back to higher precision using brainfloats. This paper therefore evaluates the trade-off between energy reduction through extreme quantization versus accuracy loss. This evaluation is based on ResNet-I8 with the ImageNet dataset, mapped to a fully programmable architecture with special support for 8-bit and 1-bit deep learning, the BrainTTA. We show that, after applying repair methods, the use of extremely quantized DNNs makes sense. They have superior energy efficiency compared to DNNs based on 8-bit precision of weights and data, while only having a slightly lower accuracy. There is still an accuracy gap, requiring further research, but results are promising. A side effect of the much lower energy requirements of BNNs is that external DRAM becomes more dominant. This certainly requires further attention.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要