A Datalog Engine for Iterative Graph Algorithms on Large Clusters.

DSDIS(2015)

引用 0|浏览8
暂无评分
摘要
Distributed computations on graphs gained importance with the emergence of large graphs, e.g., in the web or social networks. Frameworks like Hadoop, Giraph and Spark are used for their processing. Yet, they require advanced programming techniques to minimize skew and data shuffling. Declarative, query-like, but at the same time efficient solutions like Pig for general purpose analytics are lacking. In this paper we promote the use of declarative datalog with aggregation for large graph processing. We presents an implementation which extends tApache Spark with the capability of executing datalog queries. This approach makes it possible to express graph algorithms in a well studied declarative query language and execute them on an existing and mature infrastructure for distributed computation. At the same time the data processed with datalog queries is fully integrated with the caching mechanism of Spark and can be part of a larger iterative algorithm.
更多
查看译文
关键词
datalog,spark,graph processing,big data,distributed computation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要