A SIMT Analyzer for Multi-Threaded CPU Applications

2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)(2022)

引用 1|浏览8
暂无评分
摘要
The use of GPUs for general purpose applications has drastically increased. However, the performance gain from porting multithreaded CPU workloads to massively parallel SIMT-based accelerators, like GPUs, is often unpredictable. Even with enough parallelism, programmers do not know if their CPU code will run well on a GPU without first investing the effort to refactor it into a GPGPU programming language. Most of this unpredictability stems from two key side-effects of the GPU’s energy-efficient SIMT hardware: control-flow and memory divergence.To alleviate this issue, we propose SIMTec, an analysis tool that computes the control-flow and memory divergence of arbitrary pre-compiled CPU binaries. The tool constructs and analyzes a dynamic control flow graph of the application, batches threads into warps and emulates the operation of a SIMT stack for each warp to compute the projected SIMT efficiency. Given each warp’s execution mask, memory coalescing is computed using the addresses accessed by memory instructions from parallel threads. The tool reports the SIMT efficiency and memory divergence characteristics.We validate SIMTec using a suite of 11 applications with both x86 CPU and CUDA GPU implementations on an NVIDIA Volta V100, demonstrating that SIMTec has a correlation factor of 1.00 and 0.98 for SIMT efficiency and memory divergence, respectively. To demonstrate the predictive power of SIMTec, we explore another 16 CPU workloads for which there is no 1:1 GPU implementation. We perform case studies on these applications that range from compute-intensive thread-parallel workloads to cloud-based request-parallel microservices. Using SIMTec, we demonstrate that many of these CPU-only workloads are amenable to SIMT acceleration as-is.
更多
查看译文
关键词
SIMT analyzer,general purpose applications,performance gain,multithreaded CPU workloads,SIMT-based accelerators,parallelism,CPU code,GPGPU programming language,key side-effects,control-flow,SIMTec,analysis tool,arbitrary pre-compiled CPU binaries,tool constructs,dynamic control flow graph,warps,SIMT stack,memory coalescing,memory instructions,parallel threads,memory divergence characteristics,compute-intensive thread-parallel workloads,request-parallel microservices,CPU-only workloads,SIMT acceleration,GPU energy-efficient SIMT hardware
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要