Provenance-Based Debugging and Drill-Down in Data-Oriented Workflows
ICDE(2012)
摘要
Panda (for Provenance and Data) is a system that supports the creation and execution of data-oriented workflows, with automatic provenance generation and built-in provenance tracing operations. Workflows in Panda are arbitrary a cyclic graphs containing both relational (SQL) processing nodes and opaque processing nodes programmed in Python. For both types of nodes, Panda generates logical provenance -- provenance information stored at the processing-node level -- and uses the generated provenance to support record-level backward tracing and forward tracing operations. In our demonstration we use Panda to integrate, process, and analyze actual education data from multiple sources. We specifically demonstrate how Panda's provenance generation and tracing capabilities can be very useful for workflow debugging, and for drilling down on specific results of interest.
更多查看译文
关键词
logical provenance,data-oriented workflows,provenance generation,built-in provenance,actual education data,processing node,provenance-based debugging,opaque processing,automatic provenance generation,provenance information,cyclic graph,data integration,graphic user interface,debugging,graphical user interfaces,electronic publishing,data mining,data processing,data analysis,information services
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络