SEDAR: A Semantic Data Reservoir for Integrating Heterogeneous Datasets and Machine Learning.

ERCIM News(2023)

引用 0|浏览1
暂无评分
摘要
SEDAR is a comprehensive semantic data lake that includes support for data ingestion, storage, processing, analytics and governance. The key element of SEDAR is semantic meta-data management, suitable for many use cases, e.g. provenance, versioning, lineage, dataset similarity or profiling. The generic ingestion interface can deal with any external data source ranging from files to databases and streams change and incorporates data capture with data versioning and automatic metadata extraction. Machine learning (ML) is integrated into the data lake as its artefacts (e.g., ML pipelines, notebooks, models) are stored in the data lake to allow a coherent development of data preparation and ML pipelines. As all these artefacts are related, their relationships and versions are maintained in the extended metadata repository.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要