Similarity Search In Graph Databases: A Multi-Layered Indexing Approach

2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017)(2017)

引用 76|浏览106
暂无评分
摘要
We consider in this paper the similarity search problem that retrieves relevant graphs from a graph database under the well-known graph edit distance (GED) constraint. Formally, given a graph database G = {g(1), g(2),..., g(n)} and a query graph q, we aim to search the graph gi. G such that the graph edit distance between g(i) and q, GED(g(i), q), is within a user-specified GED threshold, t. In spite of its theoretical significance and wide applicability, the GED-based similarity search problem is challenging in large graph databases due in particular to a large amount of GED computation incurred, which has proven to be NP-hard. In this paper, we propose a parameterized, partition-based GED lower bound that can be instantiated into a series of tight lower bounds towards synergistically pruning false-positive graphs from G before costly GED computation is performed. We design an efficient, selectivity-aware algorithm to partition graphs of G into highly selective subgraphs. They are further incorporated in a cost-effective, multi-layered indexing structure, ML-Index (Multi-Layered Index), for GED lower bound crosschecking and false-positive graph filtering with theoretical performance guarantees. Experimental studies in real and synthetic graph databases validate the efficiency and effectiveness of ML-Index, which achieves up to an order of magnitude speedup over the state-of-the-art method for similarity search in graph databases.
更多
查看译文
关键词
graph databases,multilayered indexing,similarity search problem,graph edit distance,GED constraint,NP-hard problem,false-positive graphs pruning,ML-Index
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要