Towards failure correlation for improved cloud application service resilience.

UCC Companion(2021)

引用 1|浏览2
暂无评分
摘要
Autonomously dealing with disruptions is necessary for maintaining the quality of a cloud application service. A fault, error, or failure in any component across the application service stack can potentially disrupt the service delivery. Fault localization and failure prediction are essential techniques in managing service failures. Emerging cloud computing paradigms are pushing application services to be built as loosely coupled distributed components for independent scaling. However, such architectures render existing approaches for fault localization and failure prediction to be limiting. Prevalent works on fault localization and failure prediction focus on a specific cloud service architecture layer or a subset of service components or specific fault types. These approaches restrict the view on the impact of the fault on the application service and obviate more intelligent methods for localizing faults or predicting failures, and thus efficiently dealing with service disruptions in an autonomous way. This paper contemplates the propagation of faults in multi-tiered architectures like clouds and uses a real-world disruption scenario to emphasize the need for correlating the faults across the service layers to acquire insights for end-to-end fault analysis for cloud application services.
更多
查看译文
关键词
Fault Propagation, Fault Analysis, Resilience
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要