MULTIPLY-ADD OPTIMIZED FFT KERNELS
MATHEMATICAL MODELS & METHODS IN APPLIED SCIENCES(2011)
摘要
Modern computer architecture provides a special instruction - the fused multiply-add (FMA) instruction - to perform both a multiplication and an addition operation at the same time. In this paper newly developed radix-2, radix-3, and radix-5 FFT kernels that efficiently take advantage of this powerful instruction are presented. If a processor is provided with FMA instructions, the radix-a FFT algorithm introduced has the lowest complexity of all Cooley-Tukey radix-2 algorithms. All floating-point operations are executed as FMA instructions. Compared to conventional radix-3 and radix-5 kernels, the new radix-3 and radix-5 kernels greatly improve the utilization of FMA instructions, which results in a significant reduction in complexity. In general, the advantages of the FFT algorithms presented in this paper are their low arithmetic complexity, their high efficiency, and their striking simplicity Numerical experiments show that FFT programs using the new kernels clearly outperform even the best conventional FFT routines.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要