TEMP: thread batch enabled memory partitioning for GPU.

DAC(2016)

引用 14|浏览52
暂无评分
摘要
As massive multi-threading in GPU imposes tremendous pressure on memory subsystems, efficient bandwidth utilization becomes a key factor affecting the GPU throughput. In this work, we propose thread batch enabled memory partitioning (TEMP), to improve GPU performance through the improvement of memory bandwidth utilization. In particular, TEMP clusters multiple thread blocks sharing the same set of pages into a thread batch and dispatches the entire thread batch to a stream multiprocessor. TEMP separates the memory access streams of different thread batches by OS memory management, preserving the intrinsic locality of thread batches and increasing the memory access parallelism. Experimental results show that TEMP can obtain up to 10.3% performance improvement and 14.6% DRAM energy reduction compared to a state-of-the-art scheduler without any memory-side optimizations.
更多
查看译文
关键词
TEMP,thread batch enabled memory partitioning,multithreading,memory subsystems,GPU throughput,GPU performance,memory bandwidth utilization,thread blocks clustering,stream multiprocessor,memory access streams,OS memory management,memory access parallelism,DRAM energy reduction,graphics processing unit
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要