Delta Counter: Bandwidth-Efficient Encryption Counter Representation for Secure GPU Memory

IEEE Transactions on Dependable and Secure Computing(2024)

引用 0|浏览0
暂无评分
摘要
The security of accelerators such as GPUs has recently gained significant attention with their wide adoption in the cloud since they are vulnerable to physical attacks. To support secure memory for GPUs, the critical performance bottleneck is the memory bandwidth contention between the regular data and the security metadata [1]. With counter-mode encryption, the security meta includes counters, the Bonsai Merkle Tree (BMT), and message authentication codes (MACs). In this work, we focus on encryption counters given their impact on the counter and BMT traffic while leveraging prior schemes [2], [3] to address the MAC traffic. We first analyze the characteristics of the encryption counters from a wide range of GPGPU benchmarks and make two key observations. (1) With the split counter scheme, the cache blocks in a large portion of the memory space, sometimes the entire GPU memory space, share the same major counter value. (2) The difference among minor counters is fairly limited. We then propose a novel scheme to reduce the encryption counter traffic. Our design includes (a) a highly compact way of counter representation and (b) a verification scheme to determine the correct minor counter values. In our design, we use a few on-chip registers to hold the major counters and use a (7-bit) base value along with two small (2-bit) deltas to represent the minor counters in a large memory chunk, one delta for the most frequent delta between minor counters and the base, the other delta for the maximal difference between a minor counter and the base. This way, for a large memory chunk (e.g., 16kB), the counter overhead becomes nearly negligible (less than 2B). We then leverage the existing MAC verification logic to verify the minor counters computed from the base and deltas. Our approach essentially trades off decryption and integrity-check latency for reduced counter-data traffic to take advantage of the latency-hiding nature of GPUs. Compared to prior works on reducing counter traffic [4], our scheme handles more counter value patterns (as we don't restrict the counters to be the same in a memory chunk) and is more effective in reducing counter traffic. Our study also reveals that the GPU memory data are typically compressible. As a result, we can co-locate the MACs with the compressed cache blocks, similar to [3]. Our experimental results show that our proposed delta counter scheme significantly reduces the storage and bandwidth overheads of encryption counters and achieves secure GPU memory with an average performance overhead of 2.01% compared to GPU without security support. Our delta scheme is also compatible with SYNERGY [2], which leverages ECC chips to store MACs, and our achieved performance overhead is 2.83
更多
查看译文
关键词
GPU,GPU Security,Secure Memory
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要