ReBEC: A replacement-based energy-efficient fault-tolerance design for associative caches

Xin Gao, Naiyuan Cui, Jiawei Nian,Zongnan Liang, Jiaxuan Gao,Hongjin Liu,Mengfei Yang

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE(2024)

引用 0|浏览11
暂无评分
摘要
Severe environments like space radiation can induce soft errors in processors and incur unexpected bit-flips. Error Detection and Correction (EDAC) is a crucial method to protect the on-chip cache hierarchy against soft errors. However, conventional schemes employed in modern processors almost always focus on fixed EDAC protection designs, without exploiting the dynamic access behaviors of cache blocks during runtime. In these schemes, parity or Hamming coding format is used in a pre-defined way and does not change during the read/ write access in caches, which may induce unnecessary energy overhead. Surprisingly, we observe that several cache blocks, especially in low-level caches, are accessed occasionally or even not accessed at all, which do not require the strong protection. In this paper we propose a configurable dynamic fault-tolerance cache design, called Replacement-Based EDAC Cache (ReBEC), to improve energy efficiency in modern reliable cache hierarchy. We divide the error protection space into three levels and leverage the access counters in replacement policies such as LRU and its derivatives to adaptively adjust the protection level of each cache block. To reduce energy consumption, the newly inserted cache blocks will be initially protected in the weak level and promoted adaptively when its access priority is elevated. The evaluation results illustrate that our proposal outperforms the traditional schemes on the SPEC CPU benchmark suite while achieving comparable fault-tolerance capability. ReBEC is capable of reducing the dynamic energy consumption overhead of check schemes by up to 43.5% and achieving an average reduction of 23.6% compared to the fixed protection design. Moreover, our proposal is orthogonal to previous EDAC schemes and can be reconfigured to further enhance the fault-tolerance capability.
更多
查看译文
关键词
Reliability,Cache,Error detection/correction codes,Fault -tolerance,Energy efficiency
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要