Performance modeling for gpu architectures

Performance modeling for gpu architectures(2012)

引用 23|浏览6
暂无评分
摘要
The recent years saw a great deal of interest in developing high-performance general-purpose applications for GPUs. However, today's performance tuning practice by GPU programmers still demands manual measurement and paper-and-pencil analysis. In this dissertation, we present a performance model that helps GPU programmers and architects to identify performance bottlenecks, analyze their causes, and quantitatively predict the effectiveness of potential optimizations. In particular, we use a microbenchmark-based approach to model three major components of GPU execution time: the instruction pipeline, shared memory access, and global memory access. We demonstrate our model by applying it to three types of applications: tridiagonal solvers, matrix multiply, and image error diffusion. Driven by the analysis of our model, we study the performance characteristics of each application, propose a variety of algorithmic and programming optimizations, and further perform experiments to confirm the significant performance gains predicted by our model. Moreover, our model applied to analysis on these case studies reveals potential architectural improvements on hardware resource allocation, avoiding bank conflicts, work scheduling, and memory transaction granularity.
更多
查看译文
关键词
global memory access,performance bottleneck,performance modeling,GPU execution time,gpu architecture,paper-and-pencil analysis,GPU programmer,shared memory access,memory transaction granularity,performance characteristic,significant performance gain,performance model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要