Fast Scan Algorithms On Graphics Processors

ICS(2008)

引用 103|浏览86
暂无评分
摘要
Scan and segmented scan are important data-parallel primitives for a wide range of applications. We present fast, work-efficient algorithms for these primitives on graphics processing units (GPUs). We use novel data representations that map well to the GPU architecture. Our algorithms exploit shared memory to improve memory performance. We further improve the performance of our algorithms by eliminating shared-memory bank conflicts and reducing the overheads in prior shared-memory GPU algorithms. Furthermore, our algorithms are designed to work well on general data sets, including segmented arrays with arbitrary segment lengths. We also present optimizations to improve the performance of segmented scans based on the segment lengths. We implemented our algorithms on a PC with an NVIDIA GeForce 8800 GPU and compared our results with prior GPU-based algorithms. Our results indicate up to IN higher performance over prior algorithms on input sequences with millions of elements.
更多
查看译文
关键词
Scan,all-prefix-sum,segmented scan,GPGPU,GPU,parallel algorithm,HPC,many-core
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要