SnakeByte: A TLB Design with Adaptive and Recursive Page Merging in GPUs.

HPCA(2023)

引用 0|浏览11
暂无评分
摘要
This paper presents an address translation scheme in GPUs named SnakeByte that can dynamically manage variable-sized pages and maximize TLB reach by recursively merging contiguous pages. Memory virtualization has become an integral part of GPUs to enhance programmability and memory management efficiency. However, conventional memory virtualization methods using multi-level page tables and caching them in TLBs are insufficient to provide GPUs with enough address translation coverage for the massive volume of data. SnakeByte implements a hardware-based address translation mechanism that recursively merges contiguous pages into larger page groups and effectively extends TLB coverage. SnakeByte allows multiple equal-sized pages coalescing into a page table entry (PTE). It records the validity of pages to be merged using a bit vector, and few bits are annexed to indicate the size of merged pages. If all pages covered by the PTE are allocated with contiguity, the PTE is promoted to be further coalesced into a larger page group. The recursive coalescence of contiguous pages enables SnakeByte to handle variable-sized page groups with the exponentially increasing TLB reach. Associated with a contiguity-aware memory allocator, SnakeByte can consolidate vastly contiguous address spaces into a few TLB entries. Consequently, it significantly reduces TLB misses for large working sets in GPUs and achieves substantial performance improvements. Experiment results show that SnakeByte decreases the number of page table walks by 6.5x and enhances the GPU performance by 2.0x on average over the conventional paging scheme.
更多
查看译文
关键词
GPU,virtual memory,address translation,TLB
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要