Optimizing gSpan for Molecular Datasets

msra

引用 25|浏览6
暂无评分
摘要
We propose two optimizations for mining molecular databases with gSpan, one of the state-of-the-art graph mining algorithms. Both op- timizations apply to the enumeration of subgraph occurrences in a graph database, which is, also according to our proling, the most expensive op- eration of gSpan. The rst optimization reduces the number of subgraph isomorphisms that need to be accessed for proper support computation in considering the symmetries inherent in many chemical molecules, and the second speeds up subgraph isomorphism tests by making use of the non-uniform frequency distribution of atom and bond types. The opti- mizations are part of a reimplementation of the original gSpan algorithm and are shown to signican tly increase the performance on two chemical datasets.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要