MULTIPLY-ADD OPTIMIZED FFT KERNELS

MATHEMATICAL MODELS & METHODS IN APPLIED SCIENCES(2011)

引用 9|浏览5
暂无评分
摘要
Modern computer architecture provides a special instruction - the fused multiply-add (FMA) instruction - to perform both a multiplication and an addition operation at the same time. In this paper newly developed radix-2, radix-3, and radix-5 FFT kernels that efficiently take advantage of this powerful instruction are presented. If a processor is provided with FMA instructions, the radix-a FFT algorithm introduced has the lowest complexity of all Cooley-Tukey radix-2 algorithms. All floating-point operations are executed as FMA instructions. Compared to conventional radix-3 and radix-5 kernels, the new radix-3 and radix-5 kernels greatly improve the utilization of FMA instructions, which results in a significant reduction in complexity. In general, the advantages of the FFT algorithms presented in this paper are their low arithmetic complexity, their high efficiency, and their striking simplicity Numerical experiments show that FFT programs using the new kernels clearly outperform even the best conventional FFT routines.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要