Data File Layout Inference Using Content-Based Oracles

Computational Science and Engineering(2013)

引用 1|浏览2
Data file layout inference refers to the problem of identifying the organizational characteristics associated with a structured text file, where every record in a text file shares the same structural properties. These properties include: character encoding, record length, field length (indicated by delimiting characters or fixed length), field position, and field semantic content. Within this paper, the above information is referred to as the layout of a file. This structural layout information is required to extract, transform, and load files into workflows within various data warehouse and data mining applications. A common need, layout inference is a manual, labor intensive process requiring human expertise whenever a file's layout is unavailable, miscommunicated, or changed. This paper proposes an automated methodology for solving the layout inference problem by discovering the metadata of a structured text file and reports the results of a prototype system for real data files from customer data integration and management application.
structured text file,text file share,real data file,layout inference problem,content-based oracles,customer data integration,layout inference,data file layout inference,various data warehouse,load file,structural layout information,data mining application,data mining
AI 理解论文
Chat Paper