Towards the Specification and Generation of Time Series Datasets from Data Lakes

2023 IEEE 31st International Requirements Engineering Conference Workshops (REW)(2023)

引用 0|浏览1
暂无评分
摘要
These days, more and more organizations are building data lakes as a mechanism to store the information they generate. This information is considered as a valuable asset that, if properly analyzed, can help to make more informed decisions. However, since the analyses to be performed are often not known in advance, these data are stored in a raw format. This means that any application built on top of a data lake must carefully elicit what data will be used for a particular analysis and how those data will be transformed to make them all fit together into a dataset. This data selection and preparation task is typically performed by data scientists that write large and complicated scripts in data management languages to extract and transform the required data. This reduces the productivity of data scientists, who must write large pieces of highly similar code. It also makes it difficult for domain experts to participate in this process because they have little understanding of these scripts. To alleviate this problem, this work introduces a work-in-progress version of a high-level declarative language for specifying the requirements that a dataset coming from a data lake must satisfy. This language is then processed to automatically generate the specified dataset, allowing data scientists and domain experts to be agnostic about the details of how data are exactly retrieved and transformed.
更多
查看译文
关键词
time series datasets,time series,specification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要