Low-Overhead Trace Collection and Profiling on GPU Compute Kernels

Sébastien Darche,Michel R. Dagenais

ACM Transactions on Parallel Computing(2023)

引用 0|浏览0
暂无评分
摘要
While GPUs can bring substantial speedup to compute-intensive tasks, their programming is notoriously hard. From their programming model, to microarchitectural particularities, the programmer may encounter many pitfalls which may hinder performance in obscure ways. Numerous performance analysis tools provide helpful data on the efficiency of the compute kernels, but few allow the programmer to efficiently gather runtime information directly on the device and pinpoint the sections to optimize. We propose in this paper an instrumentation method to collect traces while executing the compute kernel, with a reduced overhead compared to other approaches, by exploiting the inherently parallel behavior of GPUs and compartmentalizing tracing phases. The reference implementation is freely available and induces an average overhead of 1.6 × on a popular scientific computing benchmark and 1.5 × over the kernel execution time. This represents an improvement of an order of magnitude compared to similar work, and proves useful for timing-guided optimizations. The tool generates insightful execution traces and timestamps which can be analyzed to better understand performance issues in the kernel.
更多
查看译文
关键词
GPU Programming,Software Tracing,Performance Analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要