GENUS: An ETL tool treating the Big Data Variety
2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA)(2016)
摘要
The data warehouse is the most important component to supply a Business Intelligence system. It is at the core of the Decision Support System. It allows integrating data from different sources, often scattered and heterogeneous, with the purpose of helping managers in their decision-making. Thereby, the building of a data warehouse requires the execution of the Extraction-Transformation-Load (ETL) process. These recent years, the ETL is affected by the emergence of Big Data. This type of data sets was treated by some ETL studies in its 3V namely, the Volume, the Velocity, and the Variety. However, these studies do not treat the Variety of data types. Thus, in this paper, we introduce a new ETL tool that treats this aspect. GENUS, our proposed tool, extracts its data from different document types: text, image, and video, transform them, and load them to a document data warehouse. GENUS is implemented and validated in a commercial case study.
更多查看译文
关键词
ETL tool,Big Data,extraction-transformation-load process,volume-velocity-variety,GENUS,data extraction,document data warehouse
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要