Sieve: Stratified GPU-Compute Workload Sampling.

ISPASS(2023)

引用 0|浏览11
暂无评分
摘要
To exploit the ever increasing compute capabilities offered by GPU hardware, GPU-compute workloads have evolved from simple computational kernels to large-scale programs with complex software stacks and numerous kernels. Driving architecture exploration using real workloads hence becomes increasingly challenging, up to the point of becoming intractable because of extremely long simulation times using existing architecture simulators. Sampling is a widely used technique to speed up simulation, however, the state-of-the-art sampling method for GPU-compute workloads, Principal Kernel Selection (PKS), falls short for challenging GPU-compute workloads with a large number of kernels and kernel invocations. This paper presents Sieve, an accurate and low-overhead stratified sampling methodology for GPU-compute workloads that groups kernel invocations based on their instruction count, with the goal of minimizing the execution time variability within strata. For the challenging Cactus and MLPerf workloads, we report that Sieve achieves an average prediction error of 1.2% (and at most 3.2%) versus 16.5% (and up to 60.4%) for PKS on real hardware (Nvidia Ampere GPU), while maintaining a similar simulation speedup of three orders of magnitude. We further demonstrate that Sieve reduces profiling time by a factor of 8x (and up to 98x) compared to PKS.
更多
查看译文
关键词
architecture simulation,sampling,GPU
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要