Exploiting Execution Provenance To Explain Difference Between Two Data-Intensive Computations

2018 IEEE 14TH INTERNATIONAL CONFERENCE ON E-SCIENCE (E-SCIENCE 2018)(2018)

引用 3|浏览23
暂无评分
摘要
Successful e-science requires control over variations of an experiment, typically encoded as a script or workflow, as well as the ability to transfer existing experiments to other environments in a reproducible way. Variations may be introduced either deliberately or because of inaccurate porting, for instance when the target environment does not satisfy the original dependencies on data and software libraries. Although these variations are captured by various provenance capturing systems, they were not exploited in an effective way to explain difference between two computations. In this paper we address the problem of explaining the observed differences in the outcomes from two such experiment variations, in terms of differences in the execution traces of those experiments. While experiments may differ widely in their structure and implementation, our hypothesis is that a general method for producing such explanations only needs to rely on the provenance of the experiment execution, for which using a standard data model, i.e., W3C PROV, is available. To test this hypothesis in a concrete workflow setting, we have developed why-diff, an algorithm to match two provenance traces derived from the execution of workflows that are variations of one another. We present the algorithm, show how it derives a delta graph which encodes the differences between the traces and thus provides the basis for generating human-readable explanations, and evaluate its performance in terms of the number of comparisons required during the matching process
更多
查看译文
关键词
software libraries,external reference datasets,reproducibility crisis,e-science experiments,data-intensive computations,execution provenance
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要