Efficient GPU Spatial-Temporal Multitasking

IEEE Trans. Parallel Distrib. Syst.(2015)

引用 102|浏览56
暂无评分
摘要
Heterogeneous computing nodes are now pervasive throughout computing, and GPUs have emerged as a leading computing device for application acceleration. GPUs have tremendous computing potential for data-parallel applications, and the emergence of GPUs has led to proliferation of GPU-accelerated applications. This proliferation has also led to systems in which many applications are competing for access to GPU resources, and efficient utilization of the GPU resources is critical to system performance. Prior techniques of temporal multitasking can be employed with GPU resources as well, but not all GPU kernels make full use of the GPU resources. There is, therefore, an unmet need for spatial multitasking in GPUs. Resources used inefficiently by one kernel can be instead assigned to another kernel that can more effectively use the resources. In this paper we propose a software-hardware solution for efficient spatial-temporal multitasking and a software based emulation framework for our system. We pair an efficient heuristic in software with hardware leaky-bucket based thread-block interleaving to implement spatial-temporal multitasking. We demonstrate our techniques on various GPU architecture using nine representative benchmarks from CUDA SDK. Our experiments on Fermi GTX480 demonstrate performance improvement by up to 46% (average 26%) over sequential GPU task execution and 37% (average 18%) over default concurrent multitasking. Compared with the state-of-the-art Kepler K20 using Hyper-Q technology, our technique achieves up to 40% (average 17%) performance improvement over default concurrent multitasking.
更多
查看译文
关键词
cuda sdk,default concurrent multitasking,sequential gpu task execution,spatial,gpu,kepler k20,gpu architecture,hardware leaky-bucket,parallel architectures,software based emulation framework,graphics processing units,resource allocation,fermi gtx480,multiprocessing systems,temporal,hyper-q technology,multitasking,gpu spatial-temporal multitasking,thread-block interleaving,multiprogramming,resource management,bandwidth,kernel,schedules,instruction sets
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要