CUDA Flux: A Lightweight Instruction Profiler for CUDA Applications
2019 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)(2019)
摘要
GPUs are powerful, massively parallel processors, which require a vast amount of thread parallelism to keep their thousands of execution units busy, and to tolerate latency when accessing its high-throughput memory system. Understanding the behavior of massively threaded GPU programs can be difficult, even though recent GPUs provide an abundance of hardware performance counters, which collect statistics about certain events. Profiling tools that assist the user in such analysis for their GPUs, like NVIDIA's nvprof and cupti, are state-of-the-art. However, instrumentation based on reading hardware performance counters can be slow, in particular when the number of metrics is large. Furthermore, the results can be inaccurate as instructions are grouped to match the available set of hardware counters.
更多查看译文
关键词
GPU,CUDA,LLVM,Profiling,PTX
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要