On the Costs and Benefits of Adopting Lifelong Learning for Software Analytics – Empirical Study on Brown Build and Risk Prediction
CoRR(2023)
摘要
Nowadays, software analytics tools using machine learning (ML) models to, for
example, predict the risk of a code change are well established. However, as
the goals of a project shift over time, and developers and their habits change,
the performance of said models tends to degrade (drift) over time. Current
retraining practices typically require retraining a new model from scratch on a
large updated dataset when performance decay is observed, thus incurring a
computational cost; also there is no continuity between the models as the past
model is discarded and ignored during the new model training. Even though the
literature has taken interest in online learning approaches, those have rarely
been integrated and evaluated in industrial environments. This paper evaluates
the use of lifelong learning (LL) for industrial use cases at Ubisoft,
evaluating both the performance and the required computational effort in
comparison to the retraining-from-scratch approaches commonly used by the
industry. LL is used to continuously build and maintain ML-based software
analytics tools using an incremental learner that progressively updates the old
model using new data. To avoid so-called "catastrophic forgetting" of important
older data points, we adopt a replay buffer of older data, which still allows
us to drastically reduce the size of the overall training dataset, and hence
model training time.
更多查看译文
关键词
lifelong learning,software analytics models,empirical study
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要