Mining Interesting Sequential Patterns using a Novel Balanced Utility Measure

Knowledge-Based Systems(2024)

引用 0|浏览2
暂无评分
摘要
High utility sequential pattern (HUSP) mining (HUSM) is an emerging task in data mining. The goal is to identify sequential patterns in a quantitative sequence database that have high importance, as measured by a utility function. Nevertheless, a limitation of HUSM is that a pattern may appear multiple times in an input sequence, and as a consequence, the utility of a pattern may be calculated in many different ways. Until now, most studies on HUSM have focused on two utility functions, called the maximum and minimum utility, which define the utility of a pattern in a sequence as the largest or smallest value, respectively. However, these two functions are two extremes, that is, they represent the best and worst cases. This is unsuitable for many practical situations, such as business decision-making, where overestimating or underestimating the utility can be very risky. To avoid these extremes, this paper introduces a novel utility function u¯, called balanced utility. It allows evaluating the importance of a pattern based on the average of its occurrences in a sequence. To efficiently mine HUSPs with u¯, two novel upper bounds (UBs) and a weak UB on u¯ are developed. These bounds are utilized as a theoretical basis for designing new pruning strategies, which are integrated with an ESUL structure in a novel algorithm named MISP-BU, for efficiently mining frequent HUSPs with u¯. Extensive experiments have confirmed that MISP-BU is highly efficient in terms of execution time, memory usage, and scalability.
更多
查看译文
关键词
Utility mining,high utility sequence,upper bound,weak upper bound,pruning strategy
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要