Finding Subgroups having Several Descriptions: Algorithms for Redescription Mining

SDM(2008)

引用 48|浏览11
暂无评分
摘要
Given a 0-1 dataset, we consider the redescription mining task introduced by Ramakrishnan, Parida, and Zaki. The problem is to find subsets of the rows that can be (approxi- mately) defined by at least two different Boolean formulae on the attributes. That is, we search for pairs (α, β )o f Boolean formulae such that the implications α → β and β → α both hold with high accuracy. We require that the two descrip- tions α and β are syntactically sufficiently different. Such pairs of descriptions indicate that the subset has different definitions, a fact that gives useful information about the data. We give simple algorithms for this task, and evalu- ate their performance. The methods are based on pruning the search space of all possible pairs of formulae by different accuracy criteria. The significance of the findings is tested by using randomization methods. Experimental results on simulated and real data show that the methods work well: on simulated data they find the planted subsets, and on real data they produce small and understandable results.
更多
查看译文
关键词
search space
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要