CrossCheck: A Holistic Approach for Tolerating Crash-Faults and Arbitrary Failures

2016 12th European Dependable Computing Conference (EDCC)(2016)

引用 2|浏览38
暂无评分
摘要
High availability is no longer optional since more and more Internet-based services provide economical or otherwise critical offerings. Traditionally, crash faults are addressed using state-machine replication (SMR) and critical data is selectively protected by checksums. Both techniques can be efficiently combined, however, large parts of a service remain susceptible to transient errors such as bit-flips or more severe state corruptions. To address this weakness and also to reduce the labouring and non-trivial effort of identifying and selectively hardening a complex service, we propose CrossCheck - a holistic approach. CrossCheck extends the crash-fault protection of SMR to also provide tolerance against arbitrary state corruptions, thereby especially addressing multithreaded applications. This is achieved by a fine-grained state comparison and a precise recovery mechanism using fault-free replicas. The implementation utilizes aspectoriented programming and therefore requires only minimal manual changes to the underlying software. In our evaluation, we show that a multithreaded key-value store can be made resilient to crashes and hardened against arbitrary state corruptions with moderate overhead.
更多
查看译文
关键词
fault tolerance,availability,client-server systems,multicore processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要