An Efficient Sampling Scheme For Comparison of Large Graphs

MLG(2007)

引用 26|浏览27
暂无评分
摘要
In this article, we attempt to rectify the situation, and make graph kernels applicable for data mining on large graphs and large datasets. Our starting point is the matrix reconstruction theorem, which states that any matrix of size 5 or above can be reconstructed given all its principal minors. By applying this to the adjacency matrix of a graph, we recursively define a graph kernel and show that it can be eciently computed by using the distribution of all size 4 subgraphs of a graph. This distribution, we argue, is similar to a sucient statistic of the graph, especially when the graph is large. Exhaustive enumeration of these subgraphs is prohibitively expensive, scaling as O(n4). But, by bounding the deviation of the empirical estimates of the distribution from the true distribution, it suces to sample a fixed number of subgraphs. Incidentally, our bounds are stronger than those found in the bio-informatics literature for similar techniques. In our experimental evaluation, our graph kernel outperforms state-of-the-art graph kernels both in times of time and classification accuracy.
更多
查看译文
关键词
data mining,adjacency matrix
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要