Building Data Civilizer Pipelines with an Advanced Workflow Engine

2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE)(2018)

引用 13|浏览1
暂无评分
摘要
In order for an enterprise to gain insight into its internal business and the changing outside environment, it is essential to provide the relevant data for in-depth analysis. Enterprise data is usually scattered across departments and geographic regions and is often inconsistent. Data scientists spend the majority of their time finding, preparing, integrating, and cleaning relevant data sets. Data Civilizer is an end-to-end data preparation system. In this paper, we present the complete system, focusing on our new workflow engine, a superior system for entity matching and consolidation, and new cleaning tools. Our workflow engine allows data scientists to author, execute and retrofit data preparation pipelines of different data discovery and cleaning services. Our end-to-end demo scenario is based on data from the MIT data warehouse and e-commerce data sets.
更多
查看译文
关键词
Data Cleaning,Data Integration,Data Discovery
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要