Mining Statistically Significant Patterns with High Utility

Int. J. Comput. Intell. Syst.(2022)

引用 1|浏览0
暂无评分
摘要
Statistically significant pattern mining (SSPM) is to mine patterns with significance based on hypothesis test. Under the constraint of statistical significance, our study aims to introduce a new preference relation into high utility patterns and to discover high utility and significant patterns (HUSPs) from transaction datasets, which has never been considered in existing SSPM problems. Our approach can be divided into two parts, HUSP-Mining and HUSP-Test. HUSP-Mining looks for HUSP candidates and HUSP-Test tests their significance. HUSP-Mining is not outputting all high utility itemsets (HUIs) as HUSP candidates; it is established based on candidate length and testable support requirements which can remove many insignificant HUIs early in the mining process; compared with the traditional HUIs mining algorithm, it can get candidates in a short time without losing the real HUSPs. HUSP-Test is to draw significant patterns from the results of HUSP-Mining based on Fisher’s test. We propose an iterative multiple testing procedure, which can alternately and efficiently reject a hypothesis and safely ignore the hypotheses that have less utility than the rejected hypothesis. HUSP-Test controls Family-wise Error Rate (FWER) under a user-defined threshold by correcting the test level which can find more HUSPs than standard Bonferroni’s control. Substantial experiments on real datasets show that our algorithm can draw HUSPs efficiently from transaction datasets with strong mathematical guarantee.
更多
查看译文
关键词
Statistically significant patterns,Fisher's test,High utility,FWER
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要