Effective Filters and Linear Time Verification for Tree Similarity Joins

2019 IEEE 35th International Conference on Data Engineering (ICDE)(2019)

引用 7|浏览30
暂无评分
摘要
The tree similarity join computes all similar pairs in a collection of trees. Two trees are similar if their edit distance falls within a user-defined threshold. Previous algorithms, which are based on a filter-verify approach, suffer from the following two issues. First, ineffective filters produce a large number of candidates that must be further verified. Second, the candidates are verified by computing the tree edit distance, which is cubic in the number of tree nodes. Thus, these techniques fail to scale to large tree collections and are infeasible even for small collections when the trees are large. In this paper, we present a scalable solution for the tree similarity join that is based on (1) an effective indexing technique that leverages both the labels and the structure of trees to reduce the number of candidates, (2) an efficient upper bound filter that moves many of the candidates directly to the join result without additional verification, (3) a linear time verification technique for the remaining candidates that avoids the expensive tree edit distance. We are the first to propose an efficient, non-trivial upper bound and linear time verification for the tree edit distance. Unlike previous solutions, our approach scales to collections with millions of trees. We improve the overall join time by up to two orders of magnitude w.r.t. the state of the art.
更多
查看译文
关键词
Indexes,Upper bound,Transforms,Conferences,Data engineering,Maximum likelihood detection,Nonlinear filters
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要