Mining Probabilistically Frequent Sequential Patterns in Large Uncertain Databases

IEEE Transactions on Knowledge and Data Engineering(2014)

引用 67|浏览15
暂无评分
摘要
dData uncertainty is inherent in many real-world applications such as environmental surveillance and mobile tracking. Mining sequential patterns from inaccurate data, such as those data arising from sensor readings and GPS trajectories, is important for discovering hidden knowledge in such applications. In this paper, we propose to measure pattern frequentness based on the possible world semantics. We establish two uncertain sequence data models abstracted from many real-life applications involving uncertain sequence data, and formulate the problem of mining probabilistically frequent sequential patterns (or p-FSPs) from data that conform to our models. However, the number of possible worlds is extremely large, which makes the mining prohibitively expensive. Inspired by the famous PrefixSpan algorithm, we develop two new algorithms, collectively called U-PrefixSpan, for p-FSP mining. U-PrefixSpan effectively avoids the problem of “possible worlds explosion”, and when combined with our four pruning and validating methods, achieves even better performance. We also propose a fast validating method to further speed up our U-PrefixSpan algorithm. The efficiency and effectiveness of U-PrefixSpan are verified through extensive experiments on both real and synthetic datasets.
更多
查看译文
关键词
p-fsp mining,database management systems,frequent patterns,validating method,uncertain databases,approximate algorithm,large uncertain databases,uncertain sequence data models,possible world semantics,hidden knowledge discovery,world semantics,pruning method,worlds explosion avoidance,data mining,mining methods and algorithms,u-prefixspan,prefixspan algorithm,data uncertainty,probabilistically frequent sequential pattern mining,probability,pattern frequentness measurement,databases,data models,possible worlds,data model,probabilistic logic
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要