CUDA Flux: A Lightweight Instruction Profiler for CUDA Applications

2019 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)(2019)

引用 22|浏览20
暂无评分
摘要
GPUs are powerful, massively parallel processors, which require a vast amount of thread parallelism to keep their thousands of execution units busy, and to tolerate latency when accessing its high-throughput memory system. Understanding the behavior of massively threaded GPU programs can be difficult, even though recent GPUs provide an abundance of hardware performance counters, which collect statistics about certain events. Profiling tools that assist the user in such analysis for their GPUs, like NVIDIA's nvprof and cupti, are state-of-the-art. However, instrumentation based on reading hardware performance counters can be slow, in particular when the number of metrics is large. Furthermore, the results can be inaccurate as instructions are grouped to match the available set of hardware counters.
更多
查看译文
关键词
GPU,CUDA,LLVM,Profiling,PTX
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要