An Efficient Sampling Scheme For Comparison of Large Graphs


引用 26|浏览27
In this article, we attempt to rectify the situation, and make graph kernels applicable for data mining on large graphs and large datasets. Our starting point is the matrix reconstruction theorem, which states that any matrix of size 5 or above can be reconstructed given all its principal minors. By applying this to the adjacency matrix of a graph, we recursively define a graph kernel and show that it can be eciently computed by using the distribution of all size 4 subgraphs of a graph. This distribution, we argue, is similar to a sucient statistic of the graph, especially when the graph is large. Exhaustive enumeration of these subgraphs is prohibitively expensive, scaling as O(n4). But, by bounding the deviation of the empirical estimates of the distribution from the true distribution, it suces to sample a fixed number of subgraphs. Incidentally, our bounds are stronger than those found in the bio-informatics literature for similar techniques. In our experimental evaluation, our graph kernel outperforms state-of-the-art graph kernels both in times of time and classification accuracy.
data mining,adjacency matrix
AI 理解论文
Chat Paper