RouteReplies: Alleviating Long Latency in Many-Chip-Module GPUs

IEEE Computer Architecture Letters(2023)

引用 0|浏览12
暂无评分
摘要
GPU chip module count is expected to keep increasing to meet the strong scaling demands of parallel applications. In many-chip-module GPUs, memory access latency seriously limits the performance since the transferring latency between different GPU modules is very high, which cannot be easily hidden by switching between different ready threads. To handle this problem, we propose RouteReplies, which enables a GPU module to fetch data from other GPU modules in the routing path. Leveraging the data locality between different GPU modules, RouteReplies significantly reduces the memory access latency since the memory request does not need to fetch data from the faraway memory partition. For a set of applications exhibiting varying degrees of inter-module locality, RouteReplies reduces memory access latency and increases performance by 54.8% on average (up to 364.8%).
更多
查看译文
关键词
Data locality,GPUs,many-chip-module,sharing behavior
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要