Mining Interesting Sequential Patterns using a Novel Balanced Utility Measure

Knowledge-Based Systems(2024)

Cited 0|Views6
No score
High utility sequential pattern (HUSP) mining (HUSM) is an emerging task in data mining. The goal is to identify sequential patterns in a quantitative sequence database that have high importance, as measured by a utility function. Nevertheless, a limitation of HUSM is that a pattern may appear multiple times in an input sequence, and as a consequence, the utility of a pattern may be calculated in many different ways. Until now, most studies on HUSM have focused on two utility functions, called the maximum and minimum utility, which define the utility of a pattern in a sequence as the largest or smallest value, respectively. However, these two functions are two extremes, that is, they represent the best and worst cases. This is unsuitable for many practical situations, such as business decision-making, where overestimating or underestimating the utility can be very risky. To avoid these extremes, this paper introduces a novel utility function u¯, called balanced utility. It allows evaluating the importance of a pattern based on the average of its occurrences in a sequence. To efficiently mine HUSPs with u¯, two novel upper bounds (UBs) and a weak UB on u¯ are developed. These bounds are utilized as a theoretical basis for designing new pruning strategies, which are integrated with an ESUL structure in a novel algorithm named MISP-BU, for efficiently mining frequent HUSPs with u¯. Extensive experiments have confirmed that MISP-BU is highly efficient in terms of execution time, memory usage, and scalability.
Translated text
Key words
Utility mining,high utility sequence,upper bound,weak upper bound,pruning strategy
AI Read Science
Must-Reading Tree
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined