Flow-Bench: A Dataset for Computational Workflow Anomaly Detection

CoRR(2023)

引用 0|浏览47
暂无评分
摘要
A computational workflow, also known as workflow, consists of tasks that must be executed in a specific order to attain a specific goal. Often, in fields such as biology, chemistry, physics, and data science, among others, these workflows are complex and are executed in large-scale, distributed, and heterogeneous computing environments that are prone to failures and performance degradations. Therefore, anomaly detection for workflows is an important paradigm that aims to identify unexpected behavior or errors in workflow execution. This crucial task to improve the reliability of workflow executions must be assisted by machine learning-based techniques. However, such application is limited, in large part, due to the lack of open datasets and benchmarking. To address this gap, we make the following contributions in this paper: (1) we systematically inject anomalies and collect raw execution logs from workflows executing on distributed infrastructures; (2) we summarize the statistics of new datasets, as well as a set of open datasets, and provide insightful analyses; (3) we benchmark unsupervised anomaly detection techniques by converting workflows into both tabular and graph-structured data. Our findings allow us to examine the effectiveness and efficiencies of the benchmark methods and identify potential research opportunities for improvement and generalization. The dataset and benchmark code are available online with MIT License for public usage.
更多
查看译文
关键词
workflow,dataset,flow-bench
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要