On the Model Update Strategies for Supervised Learning in AIOps Solutions
arxiv(2023)
摘要
AIOps (Artificial Intelligence for IT Operations) solutions leverage the
massive data produced during the operation of large-scale systems and machine
learning models to assist software engineers in their system operations. As
operation data produced in the field are constantly evolving due to factors
such as the changing operational environment and user base, the models in AIOps
solutions need to be constantly maintained after deployment. While prior works
focus on innovative modeling techniques to improve the performance of AIOps
models before releasing them into the field, when and how to update AIOps
models remain an under-investigated topic. In this work, we performed a case
study on three large-scale public operation data and empirically assessed five
different types of model update strategies for supervised learning regarding
their performance, updating cost, and stability. We observed that active model
update strategies (e.g., periodical retraining, concept drift guided
retraining, time-based model ensembles, and online learning) achieve better and
more stable performance than a stationary model. Particularly, applying
sophisticated model update strategies could provide better performance,
efficiency, and stability than simply retraining AIOps models periodically. In
addition, we observed that, although some update strategies can save model
training time, they significantly sacrifice model testing time, which could
hinder their applications in AIOps solutions where the operation data arrive at
high pace and volume and where immediate inferences are required. Our findings
highlight that practitioners should consider the evolution of operation data
and actively maintain AIOps models over time. Our observations can also guide
researchers and practitioners in investigating more efficient and effective
model update strategies that fit in the context of AIOps.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要