LATTE-CC: Latency Tolerance Aware Adaptive Cache Compression Management for Energy Efficient GPUs

Akhil Arunkumar,Shin-Ying Lee, Vignesh Soundararajan,Carole-Jean Wu

2018 IEEE International Symposium on High Performance Computer Architecture (HPCA)（2018）

引用 22|浏览63

暂无评分

摘要

General-purpose GPU applications are significantly constrained by the efficiency of the memory subsystem and the availability of data cache capacity on GPUs. Cache compression, while is able to expand the effective cache capacity and improve cache efficiency, comes with the cost of increased hit latency. This has constrained the application of cache compression to mostly lower level caches, leaving it unexplored for L1 caches and for GPUs. Directly applying state-of-the-art high performance cache compression schemes on GPUs results in a wide performance variation from -52% to 48%. To maximize the performance and energy benefits of cache compression for GPUs, we propose a new compression management scheme, called LATTE-CC. LATTE-CC is designed to exploit the dynamically-varying latency tolerance feature of GPUs. LATTE-CC compresses cache lines based on its prediction of the degree of latency tolerance of GPU streaming multiprocessors and by choosing between three distinct compression modes: no compression, low-latency, and high-capacity. LATTE-CC improves the performance of cache sensitive GPGPU applications by as much as 48.4% and by an average of 19.2%, outperforming the static application of compression algorithms. LATTE-CC also reduces GPU energy consumption by an average of 10%, which is twice as much as that of the state-of-the-art compression scheme.

查看译文

关键词

Graphics Processing Units,Cache Compression,Latency Tolerance

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要