Near-optimal Supervised Feature Selection among Frequent Subgraphs

SDM(2009)

引用 136|浏览55
暂无评分
摘要
Graph classification is an increasingly important step in numerous application domains, such as function prediction of molecules and proteins, computerised scene analysis, and anomaly detection in program flows. Among the various approaches proposed in the liter- ature, graph classification based on frequent subgraphs is a popular branch: Graphs are represented as (usually bi- nary) vectors, with components indicating whether a graph contains a particular subgraph that is frequent across the dataset. On large graphs, however, one faces the enormous problem that the number of these frequent subgraphs may grow exponentially with the size of the graphs, but only few of them possess enough discriminative power to make them useful for graph classification. Ecient and discriminative feature selection among frequent subgraphs is hence a key challenge for graph mining. In this article, we propose an approach to feature selec- tion on frequent subgraphs, called CORK, that combines two central advantages. First, it optimizes a submodular qual- ity criterion, which means that we can yield a near-optimal solution using greedy feature selection. Second, our submod- ular quality function criterion can be integrated into gSpan, the state-of-the-art tool for frequent subgraph mining, and help to prune the search space for discriminative frequent subgraphs even during frequent subgraph mining.
更多
查看译文
关键词
anomaly detection,search space,feature selection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要