DCM Explorer: A Tool to Support Transparent Data Cleaning through Provenance Exploration

PROCEEDINGS OF 14TH INTERNATIONAL WORKSHOP ON THE THEORY AND PRACTICE OF PROVENANCE, TAPP 2022(2022)

引用 3|浏览11
暂无评分
摘要
Data cleaning and preparation are essential phases of data science and machine learning (ML) workflows. Unfortunately, data cleaning processes are rarely well documented, despite the fact that they are error-prone and often involve hundreds of individual transformation steps. We have developed DCM (Data Cleaning Model) which captures provenance information for data cleaning. In this paper, we present DCM Explorer, a companion tool for DCM to explore and use data cleaning provenance. With DCM Explorer, a user can query and visualize the data cleaning workflows that are "hidden" in recorded provenance information, show different states of the data (as it underwent cleaning), explore an individual cell's history, etc. Through query-driven provenance reports, DCM Explorer adds valuable process documentation, making data cleaning more transparent, self-explanatory, and reusable.
更多
查看译文
关键词
Data cleaning,scientific workflows,transparency,data provenance
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要