Auto-Tuning Of Fast Fourier Transform On Graphics Processors

ACM SIGPLAN Notices(2011)

引用 96|浏览63
暂无评分
摘要
We present an auto-tuning framework for FFTs on graphics processors (GPUs). Due to complex design of the memory and compute subsystems on GPUs, the performance of FFT kernels over the range of possible input parameters can vary widely. We generate several variants for each component of the FFT kernel that, for different cases, are likely to perform well. Our auto-tuner composes variants to generate kernels and selects the best ones. We present heuristics to prune the search space and profile only a small fraction of all possible kernels. We compose optimized kernels to improve the performance of larger FFT computations. We implement the system using the NVIDIA CUDA API and compare its performance to the state-of-the-art FFT libraries. On a range of NVIDIA GPUs and input sizes, our auto-tuned FFT's outperform the NVIDIA CUFFT 3.0 library by up to 38 x and deliver up to 3 x higher performance compared to a manually-tuned FFT.
更多
查看译文
关键词
Performance,Algorithms,Fast Fourier Transform,FFT,GPU,high performance,auto-tuning,performance analysis,performance tuning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要