Collecting and Analyzing Provenance on Interactive Notebooks: When IPython Meets noWorkflow

TaPP(2015)

引用 37|浏览65
暂无评分
摘要
Interactive notebooks help users explore code, run simulations, visualize results, and share them with other people. While these notebooks have been widely adopted in teaching as well as by scientists and data scientists that perform exploratory analyses, their provenance support is limited to the visualization of some intermediate results and code sharing. Once a user arrives at a result, it is hard, and sometimes impossible, to retrace the steps that led to the result, since they do not collect the provenance for intermediate resuls or of the environment. As a result, users must fulfill this gap using external tools such as workflow management systems. To overcome this limitation, we propose a new approach to capture provenance from notebooks. We build upon no Workflow, a system that systematically collects provenance for Python scripts. By integrating no Workflow and notebooks, provenance is automatically and transparently captured, allowing users to focus on their exploratory tasks within the notebook. In addition, they are able to analyze provenance information within the notebook, to both reason about and debug their work, using visualizations, SQL queries, Prolog queries, and Python code.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要