Replay without recording of production bugs for service oriented applications.

ASE(2018)

引用 17|浏览97
暂无评分
摘要
Short time-to-localize and time-to-fix for production bugs is extremely important for any 24x7 service-oriented application (SOA). Debugging buggy behavior in deployed applications is hard, as it requires careful reproduction of a similar environment and workload. Prior approaches for automatically reproducing production failures do not scale to large SOA systems. Our key insight is that for many failures in SOA systems (e.g., many semantic and performance bugs), a failure can automatically be reproduced solely by relaying network packets to replicas of suspect services, an insight that we validated through a manual study of 16 real bugs across five different systems. This paper presents Parikshan, an application monitoring framework that leverages user-space virtualization and network proxy technologies to provide a sandbox “debug” environment. In this “debug” environment, developers are free to attach debuggers and analysis tools without impacting performance or correctness of the production environment. In comparison to existing monitoring solutions that can slow down production applications, Parikshan allows application monitoring at significantly lower overhead.
更多
查看译文
关键词
Fault reproduction, live debugging
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要