Improving GPGPU resource utilization through alternative thread block scheduling

HPCA(2014)

引用 209|浏览106
暂无评分
摘要
High performance in GPGPU workloads is obtained by maximizing parallelism and fully utilizing the available resources. The thousands of threads are assigned to each core in units of CTA (Cooperative Thread Arrays) or thread blocks - with each thread block consisting of multiple warps or wavefronts. The scheduling of the threads can have significant impact on overall performance. In this work, explore alternative thread block or CTA scheduling; in particular, we exploit the interaction between the thread block scheduler and the warp scheduler to improve performance. We explore two aspects of thread block scheduling - (1) LCS (lazy CTA scheduling) which restricts the maximum number of thread blocks allocated to each core, and (2) BCS (block CTA scheduling) where consecutive thread blocks are assigned to the same core. For LCS, we leverage a greedy warp scheduler to help determine the optimal number of thread blocks by only measuring the number of instructions issued while for BCS, we propose an alternative warp scheduler that is aware of the “block” of CTAs allocated to a core. With LCS and the observation that maximum number of CTAs does not necessary maximize performance, we also propose mixed concurrent kernel execution that enables multiple kernels to be allocated to the same core to maximize resource utilization and improve overall performance.
更多
查看译文
关键词
block cta scheduling,scheduling,concurrency control,high-performance computing,optimal thread blocks,gpgpu workloads,bcs,graphics processing units,multi-threading,performance improvement,thread block scheduler,mixed concurrent kernel execution,resource allocation,gpgpu resource utilization improvement,thread-block scheduling,lazy cta scheduling,resource utilization maximization,greedy warp scheduler,lcs,parallelism maximization,multiple wavefronts,cooperative thread arrays,hardware,resource management,instruction sets,kernel,memory management
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要