Guarding the guards: Enhancing LNS performance for common applications

2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP)(2016)

引用 3|浏览2
暂无评分
摘要
The rounding modes of floating point arithmetic are usually simplified in implementations of alternative number systems, including the Logarithmic Number System (LNS), to a single round-to-nearest mode requiring internal guard bits that are exponentially expensive to provide. Noting that rounding takes significant time and hardware, this paper describes two innovations that enhance LNS performance, and we analyze the possible improvements for FFT, matrix inversion, and other application-specific examples where dedicated processor hardware may be preferred.The first innovation is to retain guard bit information for subsequent reuse, leading to relaxed interpolation requirements (i.e. reduced storage and potentially faster multiplication) within the usual logarithmic ALU. Further, we propose that the user may allow results to be returned without rounding, for the many applications where low latency takes priority over higher accuracy. For the sake of clarity in this investigation, a linear interpolator using cotransformation is employed, although these ideas are directly compatible with other published techniques to improve LNS system-level metrics, including higher-order interpolators. We define generalized error characteristics for this scheme, yielding zero-average-error at the cost of slight complexity in table addressing. Rather than `correctly round' all results we demonstrate that keeping the guard information can provide at least 1 bit of additional effective accuracy (two-fold improvement in r.m.s. noise) to the final result without increasing interpolation cost. The storage for interpolation coefficients is reduced to around one fifth (80% reduction) when compared to the most similar published device, with error characteristics better than minimax. To exercise the new error model and the impact of the guard field, experiments have been conducted over a range of wordlengths for important application algorithms. Here we present results for FFT and Gauss-Jordan matrix inversion. The hardware impact is assessed through synthesis targeting a contemporary entry-level FPGA, which confirms that the approach is effectively neutral from an implementation standpoint (a larger number of registers are required for our design, while fewer logic block slices are needed). The theoretical basis and experimental validation together serve to change the trade space in which such alternative arithmetic has hitherto competed.
更多
查看译文
关键词
logic block slices,FPGA,field programmable gate arrays,floating point arithmetic,logarithmic number system,LNS system-level metrics,round-to-nearest mode,internal guard bits,fast Fourier transforms,FFT,processor hardware,guard bit information,relaxed interpolation requirements,logarithmic ALU,linear interpolator,higher-order interpolators,interpolation coefficients,Gauss-Jordan matrix inversion
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要