Optimizing gSpan for Molecular Datasets
msra
摘要
We propose two optimizations for mining molecular databases with gSpan, one of the state-of-the-art graph mining algorithms. Both op- timizations apply to the enumeration of subgraph occurrences in a graph database, which is, also according to our proling, the most expensive op- eration of gSpan. The rst optimization reduces the number of subgraph isomorphisms that need to be accessed for proper support computation in considering the symmetries inherent in many chemical molecules, and the second speeds up subgraph isomorphism tests by making use of the non-uniform frequency distribution of atom and bond types. The opti- mizations are part of a reimplementation of the original gSpan algorithm and are shown to signican tly increase the performance on two chemical datasets.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要