Fast and High-Accuracy Approximate MAC Unit Design for CNN Computing

h xiao, h xu, x chen, y wang

IEEE Embedded Systems Letters(2022)

引用 4|浏览1
暂无评分
摘要
Multiply and accumulate (MAC) composed of a set of multipliers and one reduction dominates the latency and power of convolutional neural network (CNN) accelerators. Existing approximate multipliers reduce latency and power at a tolerable drop in accuracy, without considering the data distribution (implicitly assuming that data are uniformly distributed). This letter discloses that practical CNNs’ activations and weights are usually Gaussian-like distributed, and the bits of quantized activations and weights are typically not with a probability of 0.5. Thus, we propose an approximate MAC unit design by taking into account the statistical features of input data, to achieve a balanced tradeoff among latency, power, and accuracy. The extensive experiments show that our proposed MAC unit design provides much higher accuracy than state-of-the-art approximate circuits, while the latency, area, and power are similar.
更多
查看译文
关键词
Approximate computing,convolution neural network,multiply and accumulate (MAC)
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要