Ontario: Federated Query Processing Against A Semantic Data Lake
DATABASE AND EXPERT SYSTEMS APPLICATIONS, PT I(2019)
摘要
Data lakes enable flexible knowledge discovery and reduce the overhead of materialized data integration. Albeit effective for data storage, query execution over data lakes may be expensive, being demanded novel techniques to generate plans able to exploit the main characteristics of data lakes. We devise Ontario, a federated query processing approach tailored for large-scale heterogeneous data. Ontario provides efficient and effective query processing over a federation of heterogeneous data sources in a data lake. Ontario resorts to source descriptions named RDF Molecule Templates, i.e., abstract descriptions of the properties of the entities in a unified schema and their implementation in a data lake. We empirically evaluate the effectiveness of the Ontario optimization techniques over state-of-the-art benchmarks. The observed results suggest that Ontario can effectively select plans composed of subqueries that can be efficiently executed against heterogeneous data sources in a data lake.
更多查看译文
关键词
Polystore, Federated engine, Semantic Data Lake
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络