Characterization and Exploration of Latch Checkers for Efficient RAS Protection.

DSN-S(2023)

引用 0|浏览7
暂无评分
摘要
Reliability has been, and continues to be a key consideration in the design of the IBM Z mainframe processors, and this has resulted in industry-leading performance with little-to-no downtime. In this paper, we analyze the various hardware reliability mechanisms that make the processor resilient to transient errors, and the checker architecture that enables their detection and correction. We characterize the error checking logic in the processor based on a detailed analysis of the actual design. Based on hardware measurements on a real Z processor, we then determine the error checkers that are critical from a timing standpoint. We propose algorithms that optimize checker selection without affecting either the RAS coverage or the detection of timing errors. Finally, we examine further potential runtime optimizations of checkers based on the logic utilization in representative benchmarks.
更多
查看译文
关键词
Reliability,Soft Errors,Timing errors,Parity checkers)
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要