Differential-Matching Prefetcher for Indirect Memory Access.

International Symposium on High-Performance Computer Architecture(2024)

引用 0|浏览1
暂无评分
摘要
Indirect memory access is a critical bottleneck for modern CPUs, especially for graph analysis and sparse linear algebra applications, where the values of one data array are used to generate the fetching addresses of another array. It often causes irregular data accesses with poor temporal and spatial locality that are difficult to be captured by conventional hardware prefetchers. For many complex workloads, such indirect access patterns may have different types and are nested in a multiplelevel form. Moreover, branch mispredictions would further disturb their patterns, making them even harder to detect. As a result, existing hardware prefetchers are unable to fully prefetch complex indirect patterns. This paper proposes DMP, a low-cost hardware prefetcher to improve the memory latency in several representative irregular workloads. DMP targets four types of indirect memory access patterns including single, range, multi-level, and multi-way indirect access. DMP uses differential matching to identify an indirect access pattern in pair with its corresponding index stream. Then DMP uses a flexible prefetching mechanism to dynamically adapt the prefetching degree to maintain prefetching coverage. We evaluate the performance, energy consumption, and transistor cost of DMP among various algorithms from GAP, NAS, and HPCG benchmarks. DMP improves performance by 1.8 × (up to 5.6 ×) on average against state-of-the-art hardware prefetchers and 1.2 × (up to 2.3 ×) speedup against state-of-the-art compiler-based prefetcher Prodigy. Besides, the proposed design is optimized to take only 0.9KB of storage, making it feasible to be integrated into current CPU designs.
更多
查看译文
关键词
Indirect Access,Access Patterns,Index Values,Window Size,Complex Patterns,Deeper Level,High-performance Computing,Performance Gain,Betweenness Centrality,Evidential,Irregular Patterns,Sampling Window,Complicated Patterns,Kinds Of Patterns,Breadth-first Search,Indirect Detection,Graph Algorithms,Successful Matching,Hardware Overhead,L2 Cache,Cache Misses,Memory Wall,Area Mm2,First Search
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要