Distributed Sequential Pattern Mining in Large Scale Uncertain Databases.
PAKDD(2016)
摘要
While sequential pattern mining SPM is an import application in uncertain databases, it is challenging in efficiency and scalability. In this paper, we develop a dynamic programming DP approach to mine probabilistic frequent sequential patterns in distributed computing platform Spark. Directly applying the DP method to Spark is impractical because its memory-consuming characteristic may cause heavy JVM garbage collection overhead in Spark. Therefore, we design a memory-efficient distributed DP approach and use an extended prefix-tree to save intermediate results efficiently. The extensive experimental results in various scales prove that our method is orders of magnitude faster than straight-forward approaches.
更多查看译文
关键词
Uncertain databases,Sequential pattern mining,Distributed computing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要