Duplicate Reduction In Graph Mining: Approaches, Analysis, And Evaluation

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING(2018)

引用 8|浏览0
暂无评分
摘要
At the core of graph mining lies independent expansion of substructures where a substructure (also referred to as a subgraph) independently grows into a number of larger substructures in each iteration. Such an independent expansion, invariably, leads to the generation of duplicates. In the presence of graph partitions, duplicates are generated both within and across partitions. Eliminating these duplicates (for correctness) not only incurs generation and storage cost but also additional computation for its elimination. Our primary aim is to design techniques to reduce generating duplicate substructures as we show that they cannot be eliminated. This paper introduces three constraint-based optimization techniques, each significantly improving the overall mining cost by reducing the number of duplicates generated. These alternatives provide flexibility to choose the right technique based on graph properties. We establish theoretical correctness of each technique as well as its analysis with respect to graph characteristics such as degree, number of unique labels, and label distribution. We also investigate the applicability of their combination for improvements in duplicate reduction. Finally, we discuss the effects of the constraints with respect to the partitioning schemes used in graph mining. Our experiments demonstrate significant benefits of these constraints in terms of storage, computation, and communication cost (specific to partitioned approaches) across graphs with varied characteristics.
更多
查看译文
关键词
Graph mining,substructure discovery,constraint-based heuristics,duplicate reduction,partitioning of graphs
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要