On Irregularity Localization for Scientific Data Analysis Workflows

Computational Science – ICCS 2023(2023)

引用 0|浏览33
暂无评分
摘要
The paradigm shift towards data-driven science is massively transforming the scientific process. Scientists use exploratory data analysis to arrive at new insights. This requires them to specify complex data analysis workflows, which consist of compositions of data analysis functions. Said functions encapsulate information extraction, integration, and model building through operations specified in linear algebra, relational algebra, and iterative control flow among these. A key challenge in these complex workflows is to understand and act upon irregularities in these workflows, such as outliers in aggregations. Regardless whether irregularities stem from errors or point to new insights, they must be localized and rationalized, in order to ensure the correctness and overall trustworthiness of the workflow. We propose to automatically reduce a workflow’s input data while still observing some outcome of interest, thereby computing a minimal reproducible example to support workflow debugging. In essence, we reduce the problem to the determination of the input relevant to reproducing the irregularity. To that end, we present a portfolio of different strategies being tailored to data analysis workflows that operate on tabular data. We investigate their feasibility in terms of input reduction, and compare their effectiveness and efficiency within three characteristic cases.
更多
查看译文
关键词
irregularity localization,data,scientific
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要