Studying logging practice in machine learning-based applications

Patrick Loic Foalem,Foutse Khomh,Heng Li

Information and Software Technology(2024)

引用 0|浏览0
暂无评分
摘要
Context: Logging is a common practice in traditional software development. There have been multiple studies on the characteristics of logging in traditional software systems such as C/C++, Java, and Android applications. However, logging practices in Machine Learning-based (ML-based) applications are still not well understood. The size and complexity of data and models used in ML-based applications present unique challenges for logging. Objective: In this paper, we aim to bridge this knowledge gap and provide insight into the logging practices in ML-based applications, making the first attempt to characterize current logging practices within a large number of open-source ML-based applications. Method: We conducted an empirical study on 502 open-source ML applications to understand their logging practices, combining quantitative and qualitative analyses and a survey involving 31 practitioners. Results: Our quantitative analysis reveals that logging in ML applications is less common than in traditional software, with info and warn log levels being popular. Top ML-specific logging libraries include MLflow, Tensorboard, Neptune, and W&B. Qualitatively, logging is used for data and model management, especially in model training. Our survey reinforces the importance of logging in experiment tracking, complementing our qualitative findings. Conclusion: Our research carries significant implications. It reveals distinctive ML logging practices compared to traditional software. We have highlighted the prevalence of general-purpose logging libraries in ML code, indicating a potential gap in awareness regarding ML-specific logging tools. This insight benefits researchers and developers aiming to enhance ML project reproducibility and sets the stage for exploring ML-specific logging tools’ impact on machine learning system quality and trustworthiness.
更多
查看译文
关键词
Logging practices,ML-based applications,Mining software repositories,Source code analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要