ProSecCo: progressive sequence mining with convergence guarantees

2018 IEEE International Conference on Data Mining (ICDM)(2019)

引用 11|浏览154
暂无评分
摘要
We present ProSecCo , an algorithm for the progressive mining of frequent sequences from large transactional datasets: It processes the dataset in blocks and it outputs, after having analyzed each block, a high-quality approximation of the collection of frequent sequences. ProSecCo can be used for interactive data exploration, as the intermediate results enable the user to make informed decisions as the computation proceeds. These intermediate results have strong probabilistic approximation guarantees and the final output is the exact collection of frequent sequences. Our correctness analysis uses the Vapnik–Chervonenkis (VC) dimension, a key concept from statistical learning theory. The results of our experimental evaluation of ProSecCo on real and artificial datasets show that it produces fast-converging high-quality results almost immediately. Its practical performance is even better than what is guaranteed by the theoretical analysis, and ProSecCo can even be faster than existing state-of-the-art non-progressive algorithms. Additionally, our experimental results show that ProSecCo uses a constant amount of memory, and orders of magnitude less than other standard, non-progressive, sequential pattern mining algorithms.
更多
查看译文
关键词
Approximation algorithms, Interactive data analysis, Pattern mining, Sampling, VC-dimension
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要