High-Level ETL for Semantic Data Warehouses – Full Version
arxiv(2020)
摘要
The popularity of the Semantic Web (SW) encourages organizations to organize
and publish semantic data using the RDF model. This growth poses new
requirements to Business Intelligence (BI) technologies to enable On-Line
Analytical Processing (OLAP)-like analysis over semantic data. The
incorporation of semantic data into a Data Warehouse (DW) is not supported by
the traditional Extract-Transform-Load (ETL) tools because they do not consider
semantic issues in the integration process. In this paper, we propose a
layer-based integration process and a set of high-level RDF-based ETL
constructs required to define, map, extract, process, transform, integrate,
update, and load (multidimensional) semantic data. Different to other ETL
tools, we automate the ETL data flows by creating metadata at the schema level.
Therefore, it relieves ETL developers from the burden of manual mapping at the
ETL operation level. We create a prototype, named Semantic ETL Construct
(SETLCONSTRUCT), based on the innovative ETL constructs proposed here. To
evaluate SETLCONSTRUCT, we create a multidimensional semantic DW by integrating
a Danish Business dataset and an EU Subsidy dataset using it and compare it
with the previous programmable framework SETLPROG in terms of productivity,
development time and performance. The evaluation shows that 1) SETLCONSTRUCT
uses 92
(the extension of SETLCONSTRUCT for generating ETL execution flow
automatically) further reduces the Number of Used Concepts (NOUC) by another
25
compared to SETLPROG, and is cut by another 27
SETLCONSTRUCT is scalable and has similar performance compared to SETLPROG.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络