An Approach for Testing the Extract-Transform-Load Process in Data Warehouse Systems.

IDEAS 2018: 22nd International Database Engineering & Applications Symposium Villa San Giovanni Italy June, 2018(2018)

引用 2|浏览11
暂无评分
摘要
The Extract-Transform-Load (ETL) process in data warehousing involves extracting data from source databases, transforming it into a form suitable for research and analysis, and loading it into a data warehouse. ETL processes can use complex transformations involving sources and targets that use different schemas, databases, and technologies, which make ETL implementations fault-prone. In this paper, we present an approach for validating ETL processes using automated balancing tests that check for various types of discrepancies between the source and target data. We formalize three categories of properties, namely, completeness, consistency, and syntactic validity that must be checked during testing. Our approach uses the rules provided in the ETL specifications to generate source-to-target mappings, from which balancing test assertions are generated for each property. We evaluated the approach on a real-world health data warehouse project and revealed 11 previously undetected faults. Using mutation analysis, we demonstrated that our auto-generated assertions can detect faults in the data inside the target data warehouse.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要