Efficient Maximal Clique Enumeration Over Graph Data

Data Science and Engineering(2017)

引用 15|浏览55
暂无评分
摘要
In a wide variety of emerging data-intensive applications, such as social network analysis, Web document clustering, entity resolution, and detection of consistently co-expressed genes in systems biology, the detection of dense subgraphs (cliques) is an essential component. Unfortunately, this problem is NP-Complete and thus computationally intensive at scale—hence there is a need for efficient processing, as well as the techniques for distributing the computation across multiple machines such that the computation, which is too time-consuming on a single machine, can be efficiently performed on a machine cluster given that it is large enough. In this paper, we propose a new algorithm (called GP) for maximal clique enumeration. It identifies cliques by the operation of binary graph partitioning, which iteratively divides a graph until each task is sufficiently small to be processed in parallel. Given a connected graph G=(V,E) , the GP algorithm has a space complexity of O (| E |) and a time complexity of O(|E|μ (G)) , where μ (G) represents the number of different cliques existing in G . We also present a hybrid algorithm, which can effectively leverage the advantages of both the GP algorithm and the classical Bron-and-Kerbosch (BK) algorithm. Then, we develop corresponding parallel solutions based on the GP and hybrid algorithms. Finally, we evaluate the performance of the proposed solutions on real and synthetic graph data. Our extensive experiments show that in both centralized and parallel setting, our proposed GP and hybrid approaches achieve considerably better performance than the state-of-the-art BK approach. Our parallel solutions are implemented and evaluated on MapReduce, a popular shared-nothing parallel framework, but can easily generalize to other shared-nothing or shared-memory parallel frameworks.
更多
查看译文
关键词
Maximal clique enumeration,Parallel graph processing,Iterative graph partitioning,MapReduce
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要