An Auto-tuned Method for Solving Large Tridiagonal Systems on the GPU

Parallel & Distributed Processing Symposium(2011)

引用 112|浏览0
暂无评分
摘要
We present a multi-stage method for solving large tridiagonal systems on the GPU. Previously large tridiagonal systems cannot be efficiently solved due to the limitation of on-chip shared memory size. We tackle this problem by splitting the systems into smaller ones and then solving them on-chip. The multi-stage characteristic of our method, together with various workloads and GPUs of different capabilities, obligates an auto-tuning strategy to carefully select the switch points between computation stages. In particular, we show two ways to effectively prune the tuning space and thus avoid an impractical exhaustive search: (1) apply algorithmic knowledge to decouple tuning parameters, and (2) estimate search starting points based on GPU architecture parameters. We demonstrate that auto-tuning is a powerful tool that improves the performance by up to 5x, saves 17% and 32% of execution time on average respectively over static and dynamic tuning, and enables our multi-stage solver to outperform the Intel MKL tridiagonal solver on many parallel tridiagonal systems by 6-11x.
更多
查看译文
关键词
gpu architecture parameter,tuning parameter,dynamic tuning,multi-stage characteristic,multi-stage method,parallel tridiagonal system,large tridiagonal systems,large tridiagonal system,auto-tuned method,multi-stage solver,intel mkl tridiagonal solver,tuning space,kernel,exhaustive search,memory management,tuning,shared memory,switches,instruction sets,parallel processing,coprocessors,chip
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要