Detangler: Helping Data Scientists Explore, Understand, and Debug Data Wrangling Pipelines

2023 IEEE SYMPOSIUM ON VISUAL LANGUAGES AND HUMAN-CENTRIC COMPUTING, VL/HCC(2023)

引用 0|浏览15
暂无评分
摘要
Data scientists spend significant time on data wrangling-a process involving data cleaning, shaping, and preprocessing. Data wrangling requires meticulous exploration and backtracking to assess data quality by applying and validating numerous data transformation chains, making it a tedious and error-prone process. In this paper, we present Detangler, an interactive tool within the RStudio IDE that helps data scientists identify and debug data quality issues and wrangling code. The design of Detangler is informed via formative interviews, and it presents data scientists with (i) insights into potential data quality issues, and (ii) always-on visual summaries of the effects of individual data transformations, enabling interactive exploration of data and wrangling code. Through a laboratory study with 18 data scientists, triangulated with telemetry data, we find that Detangler improves exploration and debugging of data smells and data wrangling code. We discuss design implications for future tools for data science programming.
更多
查看译文
关键词
data science,data wrangling,programming
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要