Learning Data Transformations with Minimal User Effort

2019 IEEE International Conference on Big Data (Big Data)(2019)

引用 4|浏览109
暂无评分
摘要
Data collected from heterogeneous sources often have inconsistencies in data format and thus require transformation before the data can be used. A major issue of existing approaches is their dependency on parallel input-output data to learn the transformations. However, parallel data are not always available, and annotation requires excessive human interaction because of format diversity. Therefore, these approaches have limitations when applied to large-scale real-world problems. To address this issue, we introduce UDATA, a novel unsupervised system for non-parallel data transformation. Because the transforming data usually share common syntactic patterns, UDATA discovers common syntactic patterns from input/output examples and synthesizes the transformations between the patterns. Moreover, in UDATA, transformation results are verified by an active learning model and ambiguous results are reported to users for labeling. UDATA achieves accuracy close to other state-of-the-art supervised systems without the need for any labeled data.
更多
查看译文
关键词
UDATA,data transformations,minimal user effort,heterogeneous sources,data format,parallel input-output data,excessive human interaction,format diversity,nonparallel data transformation,transforming data,common syntactic patterns,transformation results,active learning model,ambiguous results
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要