Towards Automatic Data Format Transformations: Data Wrangling at Scale

COMPUTER JOURNAL(2019)

引用 9|浏览28
暂无评分
摘要
Data wrangling is the process whereby data are cleaned and integrated for analysis. Data wrangling, even with tool support, is typically a labour intensive process. One aspect of data wrangling involves carrying out format transformations on attribute values, for example so that names or phone numbers are represented consistently. Recent research has developed techniques for synthesizing format transformation programs from examples of the source and target representations. This is valuable, but still requires a user to provide suitable examples, something that may be challenging in applications in which there are huge datasets or numerous data sources. In this paper, we investigate the automatic discovery of examples that can be used to synthesize format transformation programs. In particular, we propose two approaches to identifying candidate data examples and validating the transformations that are synthesized from them. The approaches are evaluated empirically using datasets from open government data.
更多
查看译文
关键词
format transformations,data wrangling,program synthesis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要