Mining Productive Itemsets in Dynamic Databases

IEEE ACCESS(2020)

引用 2|浏览8
暂无评分
摘要
Discovering frequent itemsets is a data analysis task used in numerous domains. It consists of finding sets of items (itemsets) that frequently appear in a set of database records (also called transactions). Though discovering frequent itemsets is useful, it can produce a large amount of spurious patterns. As a result, the user may spend a great amount of time to analyze the itemsets found by a frequent itemset mining algorithm to find truly interesting patterns. Hence, in recent years, a key research topic has emerged which is to discover statistically significant patterns in databases. The most popular model for identifying itemsets that are statistically significant is to discover non-redundant productive itemsets. The state-of-the-art algorithm to extract this set of patterns is OPUS-Miner. A key drawback of that algorithm is that it is designed to be applied to a static database. Moreover, a second drawback of OPUS-Miner is that it discovers all patterns in a database. In other words, the user cannot search for itemsets containing some specific items. This paper addresses these issues by defining the novel problem of discovering targeted non redundant productive itemsets in dynamic databases. An algorithm named IDPI+ (Interactive Discovery of Productive Itemsets) is presented, storing transactions in a tree structure, which can then be interactively queried to identify productive and non redundant itemsets containing specific items. A structure named Query-Tree is also introduced to process many queries at the same time. Moreover, to handle dynamic databases, efficient transaction insertion and deletion algorithms are provided to update the tree. It was observed in an experimental evaluation on benchmark datasets containing various types of data that IDPI+ can handle thousands of queries per second on a desktop computer. Moreover, it was found that IPDI+ is more than an order of magnitude faster than a baseline algorithm.
更多
查看译文
关键词
Itemsets,Data mining,Heuristic algorithms,Market research,Task analysis,Benchmark testing,Dynamic database,itemset mining,nonredundant itemsets,productive itemsets,query-tree,significant patterns
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要