A Comparative Analysis of Half Precision Floating Point Representations in MACs for Deep Learning

Ernest Tolliver, Velu Pillai,Anshul Jha,Eugene John

2022 International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME)(2022)

引用 0|浏览0
暂无评分
摘要
Multiply-accumulate (MAC) units enable the parallel linear algebra operations needed to train and deploy most deep learning algorithms. Hardware that supports a high number of MAC operations per second is particularly beneficial during the training process. In this paper, we analyze the performance of four MAC units, namely the Binary16, Brain Floating Point (BFloat16), Signed Half Precision (SHP CFloat16) and Unsigned Half Precision (UHP CFloat16). The performance of the MAC units is evaluated in terms of area and power using 7nm, 16nm and 28nm technology nodes at 1 GHz, while for 45nm and 90nm nodes, area, power, and delay analysis are evaluated. The Binary16, BFloat16, SHP CFloat16 and UHP CFloat16 units handle overflow, underflow, and normalization according to their respective standards. Out of all the floating point representations, the SHP CFloat16 has an increased dynamic range of values due to its configurable exponent bias. With Standard, Low Power, and Ultra Low Power technology libraries, the MAC using this representation uses less power and area at 1 GHz as compared to the Binary16 and UHP CFloat16 formats. When synthesized with a generic typical cell library, the SHP CFloat16 MAC uses more area and power than the BFloat16 and UHP CFloat16 formats.
更多
查看译文
关键词
Deep Learning,Multiplication and Accumulation Unit,Half-Precision Floating Point,Low Power,Ultra Low Power
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要