Hardening In-memory Key-value Stores against ECC-uncorrectable Memory Errors

Tsuyoshi Shimomura,Hiroshi Yamada

2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)(2022)

引用 1|浏览22
暂无评分
摘要
Memory errors that can be detected but cannot be fixed by error correction code (ECC) modules, called ECC-uncorrectable errors, have a severe impact on the availability of the datacenter applications. In-memory key-value stores (KVSes) suffer relatively more from ECC-uncorrectable errors compared with other applications because they typically allocate a large amount of memory and manage KVs and their running states in their address spaces. The standard way of recovery is the all-clean approach that reboots the damaged applications. This eliminates all the memory objects, causing a significant performance degradation of the in-memory KVSes. This paper presents a partial-surgery approach that forces in-memory KVSes to prune the damaged objects and reconstructs their internals by using undamaged ones. We prototyped our approach on memcached 1.4.39 and Redis 5.0.3, and conducted several experiments. The results show that the prototypes successfully recover from our injected memory errors and significantly outperform the conventional all-clean approach.
更多
查看译文
关键词
Key-value stores,ECC-uncorrectable memory errors,Recovery
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要