Analysis and Modeling of Memory Errors from Large-scale Field Data Collection

Taniya Siddiqua,Athanasios E. Papathanasiou, Arijit Biswas,Sudhanva Gurumurthi, Intel Corp, Teradata Aster

semanticscholar(2013)

引用 3|浏览0
暂无评分
摘要
Main memory reliability plays a crucial role in overall system reliability. Unfortunately, our collective understanding of the rate, pattern, and impact of memory errors is inadequate and can hinder our ability to innovate new fault-tolerant designs. This paper presents an in-depth study of observed corrected error data from the main memory system of a large server population deployed in data centers. Our analysis includes multiple structures on the memory path, such as the memory controllers, busses, channels, and memory modules. Based on our observations, we present a taxonomy of potential faults in the memory path. We provide a detailed characterization of the faults and present novel insights into the nature of these faults and the errors that they induce. KeywordsReliability, DRAM, Errors, Faults, Data Analysis
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要