Data File Layout Inference Using Content-Based Oracles

Computational Science and Engineering(2013)

引用 1|浏览2
暂无评分
摘要
Data file layout inference refers to the problem of identifying the organizational characteristics associated with a structured text file, where every record in a text file shares the same structural properties. These properties include: character encoding, record length, field length (indicated by delimiting characters or fixed length), field position, and field semantic content. Within this paper, the above information is referred to as the layout of a file. This structural layout information is required to extract, transform, and load files into workflows within various data warehouse and data mining applications. A common need, layout inference is a manual, labor intensive process requiring human expertise whenever a file's layout is unavailable, miscommunicated, or changed. This paper proposes an automated methodology for solving the layout inference problem by discovering the metadata of a structured text file and reports the results of a prototype system for real data files from customer data integration and management application.
更多
查看译文
关键词
structured text file,text file share,real data file,layout inference problem,content-based oracles,customer data integration,layout inference,data file layout inference,various data warehouse,load file,structural layout information,data mining application,data mining
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要