Estimating the Pruned Search Space Size of Subgroup Discovery.

ICDM(2022)

引用 0|浏览4
暂无评分
摘要
Subgroup discovery (SD) is a well-established supervised pattern mining approach. A key practical challenge -in particular considering interactive mining strategies- is that it is difficult to estimate the runtime of an exhaustive search algorithm before actually running the algorithm even for experienced practitioners. This is due to the exponential explosion of the candidate search space, sophisticated pruning strategies, and implementation specifics that can all affect the runtime by orders of magnitude depending on the dataset and the exact mining task parameters. A subgroup discovery run could take mere minutes or literal years. We would not know until afterwards. In this paper, we study the estimation of the complexity and runtime of subgroup discovery algorithms by estimating the pruned search space size, i.e., the number of actually evaluated candidate subgroups. We propose a samplingbased algorithm called SDFASTEST. SDFASTEST can effectively estimate the pruned search space size of a search algorithm. In our extensive evaluation on 1026 different tasks with 2 search algorithms, SDFASTEST was able to reduce the average mean absolute log error of the search space size estimation by ca. 94% compared to the best baseline, a depth-based upper bound.
更多
查看译文
关键词
pruned search space size,discovery
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要