SPEck: mining statistically-significant sequential patterns efficiently with exact sampling

Data Mining and Knowledge Discovery(2022)

引用 5|浏览28
暂无评分
摘要
We study the problem of efficiently mining statistically-significant sequential patterns from large datasets, under different null models. We consider one null model presented in the literature, and introduce two new ones that preserve different properties of the observed dataset. We describe SPEck , a generic framework for significant sequential pattern mining, that can be instantiated with any null model, when given a procedure for sampling datasets according to the null distribution. For the previously-proposed model, we introduce a novel procedure that samples exactly according to the null distribution, while existing procedures are approximate samplers. Our exact sampler is also more computationally efficient and much faster in practice. For the null models we introduce, we give exact and/or almost uniform samplers. Our experimental evaluation shows how exact samplers can be orders of magnitude faster than approximate ones, and scale well.
更多
查看译文
关键词
Hypothesis testing,Significant Pattern Mining,Statistically-sound Knowledge Discovery,Transactional datasets,Lightly smoked ham
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要