Fast exhaustive subgroup discovery with numerical target concepts

Data Mining and Knowledge Discovery(2015)

引用 77|浏览59
暂无评分
摘要
Subgroup discovery is a key data mining method that aims at identifying descriptions of subsets of the data that show an interesting distribution with respect to a pre-defined target concept. For practical applications the integration of numerical data is crucial. Therefore, a wide variety of interestingness measures has been proposed in literature that use a numerical attribute as the target concept. However, efficient mining in this setting is still an open issue. In this paper, we present novel techniques for fast exhaustive subgroup discovery with a numerical target concept. We initially survey previously proposed measures in this setting. Then, we explore options for pruning the search space using optimistic estimate bounds. Specifically, we introduce novel bounds in closed form and ordering-based bounds as a new technique to derive estimates for several types of interestingness measures with no previously known bounds. In addition, we investigate efficient data structures, namely adapted FP-trees and bitset-based data representations, and discuss their interdependencies to interestingness measures and pruning schemes. The presented techniques are incorporated into two novel algorithms. Finally, the benefits of the proposed pruning bounds and algorithms are assessed and compared in an extensive experimental evaluation on 24 publicly available datasets. The novel algorithms reduce runtimes consistently by more than one order of magnitude.
更多
查看译文
关键词
Subgroup discovery,Pattern mining,Numerical data,Pruning,Data structures,Data mining,Algorithms
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要