ParaDiS: a Parallel and Distributed framework for Significant pattern mining.

CCGridW(2023)

引用 1|浏览0
暂无评分
摘要
Mining patterns having a high association with a class label is a supervised data mining technique, frequently used in many applications. As we test many patterns using statistical tests to find all interesting patterns, a certain association is likely achieved by chance. The state-of-the-art TopKWY algorithm mines the top-k interesting patterns while controlling the family-wise-error rate (FWER) in the result set. TopKWY is a sequential algorithm that internally uses compute-intensive closed pattern mining. Moreover, it tests several patterns against thousands of permuted class labels to control FWER. To the best of our knowledge, no parallel/distributed implementation exists to address the scalability challenges faced by TopKWY. The tree formed by the explored patterns in TopKWY is inherently irregular and the search strategy used for exploration, namely, the best-first search is non-trivial to emulate in a distributed setup. This paper designs and implements ParaDiS, a novel parallel and distributed framework for mining the top-k statistically significant patterns. We compare its performance with the sequential TopKWY algorithm for real-world datasets and observe a significant reduction in execution time. We further show that our framework achieves good speedup, minimal communication overhead, and faster pruning of non-promising branches by efficient sharing of significance threshold.
更多
查看译文
关键词
significant pattern discovery, distributed best-first search, irregular structure, dynamic load balancing, scalable, family-wise-error rate, closed pattern
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要