Distributed Sequential Pattern Mining in Large Scale Uncertain Databases.

PAKDD(2016)

引用 17|浏览37
暂无评分
摘要
While sequential pattern mining SPM is an import application in uncertain databases, it is challenging in efficiency and scalability. In this paper, we develop a dynamic programming DP approach to mine probabilistic frequent sequential patterns in distributed computing platform Spark. Directly applying the DP method to Spark is impractical because its memory-consuming characteristic may cause heavy JVM garbage collection overhead in Spark. Therefore, we design a memory-efficient distributed DP approach and use an extended prefix-tree to save intermediate results efficiently. The extensive experimental results in various scales prove that our method is orders of magnitude faster than straight-forward approaches.
更多
查看译文
关键词
Uncertain databases,Sequential pattern mining,Distributed computing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要