OpenCL™ FFT Optimizations for Intel® Processor Graphics
IWOCL(2016)
摘要
In this paper, we explore a number of OpenCL™ optimization strategies and show the pros and cons relative to clFFT, the leading OpenCL Fast Fourier Transform (FFT) library. We implemented a 1D, multi-kernel, mixed-radix Cooley-Tukey power of two algorithms that improves upon clFFT for many cases under consideration. The computation is broken down into a set of auto-generated smaller-base FFTs that fit in the execution unit registers, avoiding the use of local memory and barriers; our implementation achieves high-thread occupancy and high memory bandwidth for a wide range of global and local sizes.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络