Modelling Data Pipelines

2020 46th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)(2020)

引用 32|浏览3
暂无评分
摘要
Data is the new currency and key to success. However, collecting high-quality data from multiple distributed sources requires much effort. In addition, there are several other challenges involved while transporting data from its source to the destination. Data pipelines are implemented in order to increase the overall efficiency of data-flow from the source to the destination since it is automated and reduces the human involvement which is required otherwise.Despite existing research on ETL (Extract-Transform-Load) and ELT (Extract-Load-Transform) pipelines, the research on this topic is limited. ETL/ELT pipelines are abstract representations of the end-to-end data pipelines. To utilize the full potential of the data pipeline, we should understand the activities in it and how they are connected in an end-to-end data pipeline. This study gives an overview of how to design a conceptual model of data pipeline which can be further used as a language of communication between different data teams. Furthermore, it can be used for automation of monitoring, fault detection, mitigation and alarming at different steps of data pipeline.
更多
查看译文
关键词
Data pipelines,conceptual model,data work-flow,domain specific language,agile methodology
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要