HMG: Extending Cache Coherence Protocols Across Modern Hierarchical Multi-GPU Systems

2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)(2020)

引用 33|浏览88
暂无评分
摘要
Prior work on GPU cache coherence has shown that simple hardware-or software-based protocols can be more than sufficient. However, in recent years, features such as multi-chip modules have added deeper hierarchy and non-uniformity into GPU memory systems. GPU programming models have chosen to expose this non-uniformity directly to the end user through scoped memory consistency models. As a result, there is room to improve upon earlier coherence protocols that were designed only for flat single-GPU hierarchies and/or simpler memory consistency models. In this paper, we propose HMG, a cache coherence protocol designed for forward-looking multi-GPU systems. HMG strikes a balance between simplicity and performance: it uses a readily-implementable VI-like protocol to track coherence states, but it tracks sharers using a hierarchical scheme optimized for mitigating the bandwidth limitations of inter-GPU links. HMG leverages the novel scoped, non-multi-copy-atomic properties of modern GPU memory models, and it avoids the overheads of invalidation acknowledgments and transient states that were needed to support prior GPU memory models. On a 4-GPU system, HMG improves performance over a software-controlled, bulk invalidation-based coherence mechanism by 26% and over a non-hierarchical hardware cache coherence protocol by 18%, thereby achieving 97% of the performance of an idealized caching system.
更多
查看译文
关键词
HMG,GPU cache coherence,multichip modules,GPU memory systems,GPU programming models,scoped memory consistency models,single-GPU hierarchies,memory consistency models,4-GPU system,nonhierarchical hardware cache coherence protocol,idealized caching system,GPU memory models,VI-like protocol,software-based protocols,hierarchical multiGPU systems,hardware-based protocols,interGPU links,forward-looking multi-GPU systems,coherence states tracking,bandwidth limitation mitigation,scoped nonmulticopy-atomic properties,invalidation acknowledgments,transient states,software-controlled bulk invalidation-based coherence mechanism
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要