An empirical investigation of incident triage for online service systems

Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice(2019)

引用 82|浏览128
暂无评分
摘要
Online service systems have become increasingly popular. During operation of an online service system, incidents (unplanned interruptions or outages of the service) are inevitable. As an initial step of incident management, it is important to be able to automatically assign an incident report to a suitable team. We call this step incident triage, which can significantly affect the efficiency and accuracy of overall incident management. To better understand the incident-triage practice in industry, we perform an empirical study of incident triage on 20 large-scale online service systems in Microsoft. We find that incorrect assignment of incident reports occurs frequently and incurs unnecessary cost, especially for the incidents with high severity. For example, about 4.11% to 91.58% of incident reports are reassigned at least once and the average increment in incident-triage time caused by the reassignments is up to 10.16X. Considering the similarity between bug triage (automatically assigning bug reports to software developers) and incident triage, we then explore the applicability of typical bug-triage techniques to incident triage for online service systems. The results demonstrate that these bug-triage techniques are able to correctly assign incident reports to a certain extent, but still need to be further improved, especially for the incident reports that are assigned incorrectly at the first time. We further discuss possible ways to improve the accuracy of incident triage based on the empirical study. To our best knowledge, we are the first to investigate incident triage in industrial practice. Our results are useful for both practitioners and researchers to develop methods and tools to improve the current incident-triage practice for online service systems.
更多
查看译文
关键词
Incident Triage,Online Service Systems,Empirical Study
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要