FlyMC: Highly Scalable Testing of Complex Interleavings in Distributed Systems

Proceedings of the Fourteenth EuroSys Conference 2019(2019)

引用 47|浏览181
暂无评分
摘要
We present a fast and scalable testing approach for datacenter/cloud systems such as Cassandra, Hadoop, Spark, and ZooKeeper. The uniqueness of our approach is in its ability to overcome the path/state-space explosion problem in testing workloads with complex interleavings of messages and faults. We introduce three powerful algorithms: state symmetry, event independence, and parallel flips, which collectively makes our approach on average 16x (up to 78x) faster than other state-of-the-art solutions. We have integrated our techniques with 8 popular datacenter systems, successfully reproduced 12 old bugs, and found 10 new bugs --- all were done without random walks or manual checkpoints.
更多
查看译文
关键词
Availability, Distributed Concurrency Bugs, Distributed Systems, Reliability, Software Model Checking
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要