Distributed Set Reachability
SIGMOD/PODS'16: International Conference on Management of Data San Francisco California USA June, 2016(2016)
摘要
In this paper, we focus on the efficient and scalable processing of set-reachability queries over a distributed, directed data graph. A set-reachability query is a generalized form of a reachability query, in which we consider two sets S and T of source and target vertices, respectively, to be given as the query. The result of a set-reachability query are all pairs of source and target vertices (s, t), with s E S and t E T, where s is reachable to t (denoted as S T). In case the data graph is partitioned into multiple, edge- and vertex disjoint subgraphs (e.g., when distributed across multiple compute nodes in a cluster), we refer to the resulting setreachability problem as distributed set reachability. The key goal in processing a distributed set-reachability query over a partitioned data graph both efficiently and in a scalable manner is (1) to avoid redundant computations within the local compute nodes as much as possible, (2) to partially evaluate the local components of a set-reachability query S T among all compute nodes in parallel, and (3) to minimize both the size and number of messages exchanged among the compute nodes.Distributed set reachability has a plethora of applications in graph analytics and for query processing. The current W3C recommendation for SPARQL 1.1, for example, introduces a notion of labeled property paths which resolves to processing a form of generalized graph-pattern queries with set-reachability predicates. Moreover, analyzing dependencies among social-network communities inherently involves reachability checks between large sets of source and target vertices. Our experiments confirm very significant performance gains of our approach in comparison to state-of-theart graph engines such as Giraph++, and over a variety of graph collections with up to 1.4 billion edges.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络