CUDAsap: Statically-Determined Execution Statistics as Alternative to Execution-Based Profiling

2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)(2023)

引用 0|浏览3
暂无评分
摘要
Today a variety of different GPU types exists, raising questions regarding high-level tasks such as provisioning and scheduling. To predict execution time on different GPU types accurately, we propose a method to obtain execution statistics based on compile-time static code analysis, in which the control flow graph for the code's basic blocks is determined. This graph is represented as an adjacency matrix and used in a system of linear equations to calculate the basic block execution frequencies. Kernel execution itself is not necessary for this analysis. We analyze the proposed method for five different benchmark suites, showing that 76 out of 79 evaluated kernels can be analyzed with an average error of 0.4 %, primarily due to different LLVM versions, with an average prediction time of 203.96 ms. Furthermore, repetitive kernels make memoization effective, and the underlying analysis is largely independent of problem size.
更多
查看译文
关键词
performance modeling,CUDA,static analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要