AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
We propose fast subtree kernels on graphs
Fast subtree kernels on graphs.
NIPS, pp.1660-1668, (2009)
In this article, we propose fast subtree kernels on graphs. On graphs with n nodes and m edges and maximum degree d, these kernels comparing subtrees of height h can be computed in O(mh), whereas the classic subtree kernel by Ramon & Gartner scales as O(n24dh). Key to this efficiency is the observation that the Weisfeiler-Lehman test of i...More
PPT (Upload PPT)
- Graph kernels have recently evolved into a branch of kernel machines that reaches deep into graph mining.
- While fast computation techniques have been developed for graph kernels based on walks  and on limited-size subgraphs , it is unclear how to compute subtree kernels efficiently.
- As a consequence, they have been applied to relatively small graphs representing chemical compounds  or handwritten digits , with approximately twenty nodes on average.
- In Section 4, the authors compare these two subtree kernels to each other, as well as to a set of four other state-of-the-art graph kernels and report results on kernel computation runtime and classification accuracy on graph benchmark datasets
- Graph kernels have recently evolved into a branch of kernel machines that reaches deep into graph mining
- Several different graph kernels have been defined in machine learning which can be categorized into three classes: graph kernels based on walks [5, 7] and paths , graph kernels based on limited-size subgraphs [6, 11], and graph kernels based on subtree patterns [9, 10]
- While fast computation techniques have been developed for graph kernels based on walks  and on limited-size subgraphs , it is unclear how to compute subtree kernels efficiently
- The N 2 sparse vector multiplications that have to be performed for kernel computation with global WL do not dominate runtime here
- We have defined a fast subtree kernel on graphs that combines scalability with the ability to deal with node labels
- It is competitive with state-of-the-art kernels on several classification benchmark datasets in terms of accuracy, even reaching the highest accuracy level on three out of four datasets, and outperforms them significantly in terms of runtime on large graphs, even the efficient computation schemes for random walk kernels  and graphlet kernels  that were recently defined. This new kernel opens the door to applications of graph kernels on large graphs in bioinformatics, for instance, protein function prediction via detailed graph models of protein structure on the amino acid level, or on gene networks for phenotype prediction
- The authors empirically compared the runtime behaviour of the two variants of the WeisfeilerLehman (WL) kernel.
- The first variant computes kernel values pairwise in O(N 2hm).
- The second variant computes the kernel values in O(N hm + N 2hn) on the dataset simultaneously.
- The authors will refer to the former variant as the ‘pairwise’ WL, and the latter as ‘global’ WL
- The authors observe that the pairwise kernel scales quadratically with dataset size N.
- When varying the number of nodes n per graph, the authors observe that the runtime of global WL scales linearly with n, and is much faster than the pairwise WL for large graphs.
- The authors observe the same picture for the height h of the subtree patterns
- The runtime of both kernels grows linearly with h, but the global WL is more efficient in terms of runtime in seconds.
- The graphlet kernel is faster than the WL kernel on MUTAG and the NCI datasets, and about a
- The authors have defined a fast subtree kernel on graphs that combines scalability with the ability to deal with node labels
- It is competitive with state-of-the-art kernels on several classification benchmark datasets in terms of accuracy, even reaching the highest accuracy level on three out of four datasets, and outperforms them significantly in terms of runtime on large graphs, even the efficient computation schemes for random walk kernels  and graphlet kernels  that were recently defined.
- An exciting algorithmic question for further studies will be to consider kernels on graphs with continuous or high-dimensional node labels and their efficient computation
- Table1: Prediction accuracy (± standard error) on graph classification benchmark datasets
- Table2: CPU runtime for kernel computation on graph classification benchmark datasets factor of 3 slower on D&D. However, this efficiency comes at a price, as the kernel based on size-3 graphlets turns out to lead to poor accuracy levels on three datasets. Using larger graphlets with 4 or 5 nodes that might have been more expressive led to infeasible runtime requirements in initial experiments (not shown here)
- The subtree kernels in  and  refine the above definition for applications in chemoinformatics and hand-written digit recognition. Maheand Vert  define extensions of the classic subtree kernel that avoid tottering  and consider unbalanced subtrees. Both  and  propose to consider α-ary subtrees with at most α children per node. This restricts the set of matchings to matchings of up to α nodes, but the runtime complexity is still exponential in this parameter α, which both papers describe as feasible on small graphs (with approximately 20 nodes) with many distinct node labels. We present a subtree kernel that is efficient to compute on graphs with hundreds and thousands of nodes next.
3.1 The Weisfeiler-Lehman test of isomorphism
Our algorithm for computing a fast subtree kernel builds upon the Weisfeiler-Lehman test of isomorphism , more specifically its 1-dimensional variant, also known as “naive vertex refinement”, which we describe in the following.
Assume we are given two graphs G and G and we would like to test whether they are isomorphic. The 1-dimensional Weisfeiler-Lehman test proceeds in iterations, which we index by h and which comprise the following steps: Algorithm 1 One iteration of the 1-dimensional Weisfeiler-Lehman test of graph isomorphism 1: Multiset-label determination
- F. R. Bach. Graph kernels between point clouds. In ICML, pages 25–32, 2008.
- K. M. Borgwardt and H.-P. Kriegel. Shortest-path kernels on graphs. In Proc. Intl. Conf. Data Mining, pages 74–81, 2005.
- A. K. Debnath, R. L. Lopez de Compadre, G. Debnath, A. J. Shusterman, and C. Hansch. Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. correlation with molecular orbital energies and hydrophobicity. J Med Chem, 34:786–797, 1991.
- P. D. Dobson and A. J. Doig. Distinguishing enzyme structures from non-enzymes without alignments. J Mol Biol, 330(4):771–783, Jul 2003.
- T. Gartner, P.A. Flach, and S. Wrobel. On graph kernels: Hardness results and efficient alternatives. In B. Scholkopf and M. Warmuth, editors, Sixteenth Annual Conference on Computational Learning Theory and Seventh Kernel Workshop, COLT. Springer, 2003.
- T. Horvath, T. Gartner, and S. Wrobel. Cyclic pattern kernels for predictive graph mining. In Proceedings of the International Conference on Knowledge Discovery and Data Mining, 2004.
- H. Kashima, K. Tsuda, and A. Inokuchi. Marginalized kernels between labeled graphs. In Proceedings of the 20th International Conference on Machine Learning (ICML), Washington, DC, United States, 2003.
- P. Mahe, N. Ueda, T. Akutsu, J.-L. Perret, and J.-P. Vert. Extensions of marginalized graph kernels. In Proceedings of the Twenty-First International Conference on Machine Learning, 2004.
- P. Maheand J.-P. Vert. Graph kernels based on tree patterns for molecules. q-bio/0609024, September 2006.
- J. Ramon and T. Gartner. Expressivity versus efficiency of graph kernels. Technical report, First International Workshop on Mining Graphs, Trees and Sequences (held with ECML/PKDD’03), 2003.
- N. Shervashidze, S.V.N. Vishwanathan, T. Petri, K. Mehlhorn, and K. M. Borgwardt. Efficient graphlet kernels for large graph comparison. In Artificial Intelligence and Statistics, 2009.
- S. V. N. Vishwanathan, Karsten Borgwardt, and Nicol N. Schraudolph. Fast computation of graph kernels. In B. Scholkopf, J. Platt, and T. Hofmann, editors, Advances in Neural Information Processing Systems 19, Cambridge MA, 2007. MIT Press.
- N. Wale and G. Karypis. Comparison of descriptor spaces for chemical compound retrieval and classification. In Proc. of ICDM, pages 678–689, Hong Kong, 2006.
- B. Weisfeiler and A. A. Lehman. A reduction of a graph to a canonical form and an algebra arising during this reduction. Nauchno-Technicheskaya Informatsia, Ser. 2, 9, 1968.