Schema-independent querying for heterogeneous collections in NoSQL document stores

Information Systems(2019)

引用 6|浏览17
暂无评分
摘要
NoSQL document stores are well-tailored to efficiently load and manage massive collections of heterogeneous documents without any prior structural validation. However, this flexibility becomes a serious challenge when querying heterogeneous documents, and hence the user has to build complex queries or reformulate existing queries whenever new schemas are introduced in a collection. In this paper we propose a novel approach, based on formal foundations, for building schema-independent queries which are designed to query multi-structured documents. We present a query enrichment mechanism that consults a pre-constructed dictionary. This dictionary binds each possible path in the documents to all its corresponding absolute paths in all the documents. We automate the process of query reformulation via a set of rules that reformulate most document store operators, such as select, project, unnest, aggregate and lookup. We then produce queries across multi-structured documents which are compatible with the native query engine of the underlying document store. To evaluate our approach, we conducted experiments on synthetic datasets. Our results show that the induced overhead can be acceptable when compared to the efforts needed to restructure the data or the time required to execute several queries corresponding to the different schemas inside the collection.
更多
查看译文
关键词
Information systems,Document stores,Structural heterogeneity,Schema-independent querying
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要