Efficient high-utility occupancy itemset mining algorithm on massive data

Expert Systems with Applications(2022)

引用 3|浏览50
暂无评分
摘要
Mining interesting itemsets on massive data is a necessary topic in data mining. Nowadays, most studies use frequency or utility as primary measure. However, using these two measures individually has its own limitations. For example, itemsets with high frequencies may have low profits while itemsets with high utilities perhaps appear occasionally, so they might be misleading. In addition, the existing algorithms can only deal with small-medium scale database, and their performances degrade significantly when data is expanded. To address these drawbacks, this paper proposes a novel high utility occupancy itemset mining algorithm SHO (Suffix-based High-utility Occupancy itemset mining), it considers both quantities and profits of itemsets. SHO designs the algorithm from suffix-based partitioning, generation pruning and itemsets linking, it can mine high utility occupancy itemsets on large-scale database effectively. At the beginning, the database are divided into some non-overlapping suffix-based partitions and stored in vertical format, then the support and utility occupancy of itemset can be calculated in a certain partition instead of traversing total database. Besides, two optimization strategies and four pruning strategies are proposed to make SHO faster. The extensive experiments show that SHO is much better than the current state-of-the-art algorithm, the efficiency can be improved up to 3 orders of magnitude.
更多
查看译文
关键词
Massive data,High utility occupancy pattern mining,Suffix-based partitioning,LI strategy,RTI optimization strategy
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要