Performance modeling of graphics processing unit application using static and dynamic analysis

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE(2022)

引用 1|浏览12
暂无评分
摘要
Graphics processing units (GPUs) have become an integral part of high-performance computing to achieve an exascale performance. Understanding and estimating GPU performance is crucial for developers to design performance-driven as well as energy-efficient applications for a given architecture. This work presents a model developed using a static analysis of CUDA code to predict the execution time of NVIDIA GPU kernels without the need for running it. Here a PTX code is statically analyzed to extract instruction features, control flow, and data dependence. We propose a scheduling algorithm that satisfies resource reservation constraints to schedule these instructions in threads across streaming multiprocessors (SMs). We use dynamic analysis to build a set of memory access penalty models and use these models in conjunction with the scheduling information to estimate the execution time of the code. We present the experimental results which support that this approach works across architectures of NVIDIA GPUs. We first tested our model on two Kepler machines, where the mean percentage error (MPE)/mean absolute percentage error (MAPE) was -8.88%/28.3% for Tesla K20 and -5.66%/29.4% for Quadro K4200. We further tested the model on Maxwell and Pascal architectures and recorded the MPEs/MAPEs to be -10.64%/47.8% and -3.94%/28.5%, respectively.
更多
查看译文
关键词
analytical model, GPGPU, high-performance computing, performance prediction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要