Coverage Guided Fault Injection for Cloud Systems.

ICSE(2023)

引用 0|浏览19
暂无评分
摘要
To support high reliability and availability, modern cloud systems are designed to be resilient to node crashes and reboots. That is, a cloud system should gracefully recover from node crashes/reboots and continue to function. However, node crashes/reboots that occur under special timing can trigger crash recovery bugs that lie in incorrect crash recovery protocols and their implementations. To ensure that a cloud system is free from crash recovery bugs, some fault injection approaches have been proposed to test whether a cloud system can correctly recover from various crash scenarios. These approaches are not effective in exploring the huge crash scenario space without developers' knowledge. In this paper, we propose CrashFuzz, a fault injection testing approach that can effectively test crash recovery behaviors and reveal crash recovery bugs in cloud systems. CrashFuzz mutates the combinations of possible node crashes and reboots according to runtime feedbacks, and prioritizes the combinations that are prone to increase code coverage and trigger crash recovery bugs for smart exploration. We have implemented CrashFuzz and evaluated it on three popular open-source cloud systems, i.e., ZooKeeper, HDFS and HBase. CrashFuzz has detected 4 unknown bugs and 1 known bug. Compared with other fault injection approaches, CrashFuzz can detect more crash recovery bugs and achieve higher code coverage.
更多
查看译文
关键词
cloud system,crash recovery bug,fault injection,bug detection,fuzzing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要