Modified fused multiply-accumulate chained unit

Circuits and Systems(2014)

引用 3|浏览9
暂无评分
摘要
Fused multiply-add (FMA) units can reduce latency and increase energy efficiency in arithmetic operations. A modified architecture of a multiply-accumulation chained unit (MFMA) is described in this paper. The add/sub pipelined datapath of a traditional fused multiply-add unit is modified to save hardware resources, conserve energy and reduce latency in DSP applications. The proposed datapath for add/sub is flexible, generic and can be used in any IEEE-754 compatible floating point architecture as a replacement for the traditional multiply-accumulation chained unit. FMA and MFMA are both implemented in a nine-stage pipelined design. The clock limiting stage for both architectures is the normalization stage which remains unchanged in the proposed architecture. FPGA implementation for the proposed three-input add/sub and ASIC implementation for the MFMA is performed. In the FPGA implementation of the proposed add/sub datapath the area reduction is 19.56% and power reduction is 20.67% and the latency is halved compared to two cascaded two-input add/sub datapaths. In ASIC implementations of the classic FMA and MFMA the overall area reduction is 7.16% and power saving is 5.69%.
更多
查看译文
关键词
application specific integrated circuits,field programmable gate arrays,floating point arithmetic,pipeline arithmetic,asic,dsp,fpga,ieee-754 compatible floating point architecture,mfma,add-subpipelined datapath,area reduction,arithmetic operations,cascaded two-input add-sub datapaths,clock limiting stage,energy conservation,energy efficiency,hardware resources,latency reduction,modified fused multiply-accumulate chained unit,nine-stage pipelined design,normalization stage,floating point add datapath,fused multiply-add (fma),ieee-754 forma,multiply-accumulate chained unit (mac),single precision floating point
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要