Rewriting complex SPARQL analytical queries for efficient cloud-based processing

Big Data(2015)

引用 1|浏览16
暂无评分
摘要
Many emerging Semantic Web applications combine and aggregate data across domains for analysis. Such analytical queries compute aggregates over multiple groupings of data, resulting in query plans with complex grouping-aggregation constraints. In the context of an RDF analytical query, each such grouping maps to a graph pattern subquery with multiple join operations, and related groups often result in overlapping graph patterns within the same query. In this paper, we propose a holistic approach to optimize RDF analytical queries by refactoring queries to achieve shared execution of common subexpressions that enables parallel evaluation of groupings as well as aggregations. Such a rewriting enables shorter execution workflows, particularly beneficial for scale-out processing on distributed Cloud systems with multiple I/O phases. Experiments on real-world and synthetic benchmarks confirm that such a rewriting can achieve more efficient execution plans when compared to relational-style SPARQL query plans executed on popular (Cloud systems.
更多
查看译文
关键词
RDF Analytics, Hadoop, MapReduce, Query Rewriting, SPARQL, Semantic Web
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要