Many-Thread Aware Prefetching Mechanisms for GPGPU Applications.

Jaekyu Lee,Nagesh B. Lakshminarayana,Hyesoon Kim,Richard W. Vuduc

MICRO（2010）

引用 184|浏览34

暂无评分

摘要

ABSTRACTWe consider the problem of how to improve memory latency tolerance in massively multithreaded GPGPUs when the thread-level parallelism of an application is not sufficient to hide memory latency. One solution used in conventional CPU systems is prefetching, both in hardware and software. However, we show that straightforwardly applying such mechanisms to GPGPU systems does not deliver the expected performance benefits and can in fact hurt performance when not used judiciously. This paper proposes new hardware and software prefetching mechanisms tailored to GPGPU systems, which we refer to as many-thread aware prefetching (MT-prefetching) mechanisms. Our software MT-prefetching mechanism, called inter-thread prefetching, exploits the existence of common memory access behavior among fine-grained threads. For hardware MT-prefetching, we describe a scalable prefetcher training algorithm along with a hardware-based inter-thread prefetching mechanism. In some cases, blindly applying prefetching degrades performance. To reduce such negative effects, we propose an adaptive prefetch throttling scheme, which permits automatic GPGPU application- and hardware-specific adjustment. We show that adaptation reduces the negative effects of prefetching and can even improve performance. Overall, compared to the state-of-the-art software and hardware prefetching, our MT-prefetching improves performance on average by 16%(software pref.) / 15% (hardware pref.) on our benchmarks.

查看译文

关键词

computer graphic equipment,coprocessors,multi-threading,multiprocessing systems,storage management,CPU system,adaptive prefetch throttling,fine-grained thread,general-purpose GPU,hardware MT-prefetching,hardware-based interthread prefetching mechanism,many-thread aware prefetching mechanism,memory latency tolerance,multithreaded GPGPU,scalable prefetcher training algorithm,software MT-prefetching mechanism,thread-level parallelism,GPGPU,prefetch throttling,prefetching,

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要