Transform-data-by-example (TDE): an extensible search engine for data transformations

Hosted Content(2018)

引用 65|浏览132
暂无评分
摘要
AbstractToday, business analysts and data scientists increasingly need to clean, standardize and transform diverse data sets, such as name, address, date time, and phone number, before they can perform analysis. This process of data transformation is an important part of data preparation, and is known to be difficult and time-consuming for end-users.Traditionally, developers have dealt with these longstanding transformation problems using custom code libraries. They have built vast varieties of custom logic for name parsing and address standardization, etc., and shared their source code in places like GitHub. Data transformation would be a lot easier for end-users if they can discover and reuse such existing transformation logic.We developed Transform-Data-by-Example (TDE), which works like a search engine for data transformations. TDE "indexes" vast varieties of transformation logic in source code, DLLs, web services and mapping tables, so that users only need to provide a few input/output examples to demonstrate a desired transformation, and TDE can interactively find relevant functions to synthesize new programs consistent with all examples. Using an index of 50K functions crawled from GitHub and Stackoverflow, TDE can already handle many common transformations not currently supported by existing systems. On a benchmark with over 200 transformation tasks, TDE generates correct transformations for 72% tasks, which is considerably better than other systems evaluated. A beta version of TDE for Microsoft Excel is available via Office store1. Part of the TDE technology also ships in Microsoft Power BI.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要