EdgeFrame: Worst-Case Optimal Joins for Graph-Pattern Matching in Spark

SIGMOD/PODS '20: International Conference on Management of Data Portland OR USA June, 2020(2020)

引用 11|浏览41
暂无评分
摘要
We describe the design and implementation of EdgeFrame: a graph-specialized Spark DataFrame that caches the edges of a graph in compressed form on all worker nodes of a cluster, and provides a fast and scalable Worst-Case-Optimal Join (WCOJ) that is especially useful for matching of complex and cyclical patterns in large graphs. Our choice to forego shuffle- or communication-based WCOJ is motivated by our analysis of the Shares algorithm for distributed WCOJ, that was proven communication-optimal, but which we show to quickly deteriorate to a full broadcast of all data already with moderately complex graph patterns. Our work shows that specializing WCOJ to a multi-way self-join, and leveraging compressed storage, provides a significant opportunity for better WCOJ performance. Finally, we investigate WCOJ parallelization and load-balancing strategies and show that fine-grained dynamic load-balancing with work-stealing is to be preferred, creating interesting insights and challenges for the future evolution of the Spark scheduler.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要