A novel predictive modeling framework: combining association rule discovery with em algorithm

A novel predictive modeling framework: combining association rule discovery with em algorithm(2013)

引用 23|浏览20
暂无评分
摘要
Building predictive models and finding patterns are two fundamental problems in data mining. This thesis focuses on making contributions to these two areas. In recent years, there have been increasing efforts to apply association rule mining to build predictive models, which have resulted in the areas of Associative Classification (AC) and Associative Regression (AR). The first major contribution of this thesis is a novel predictive modeling framework that can be applied to build both AC and AR models. The resulting classification/regression model is called ACEM/AREM. ACEM/AREM derives a set of classification/regression rules by: (i) applying an instance based approach to mine itemsets which form the rules' left hand side, and (ii) developing a probabilistic model which determines, for each mined itemset, the corresponding rule's parameters. The key contributions of ACEM/AREM include the probabilistic model that is able to capture interactions among itemsets and an expectation and maximization (EM) algorithm that is derived to learn rule parameters. The extensive experimental evaluation shows that the EM optimization can improve the predictive performance dramatically. We also show that ACEM/AREM can perform better than some of the state of the art classification/regression models. The second major contribution of this thesis is the development of effective pruning methods that lead to efficient algorithms for two pattern mining problems. The first pattern mining problem is the instance based itemset mining of ACEM/AREM. ACEM/AREM utilizes an Instance-Based Itemset Miner (IBIMiner) algorithm to discover best itemsets for each training instance. IBIMiner incorporates various methods to bound the quality of any future extensions of the itemset under consideration. Our experiments show that these bounds allow IBIMiner to considerably prune the size of the search space. The second pattern mining problem is the extention of association rule mining to the dyadic datasets. These are the datasets where the features are naturally partitioned into two groups of distinct types. Traditional association rule mining methods employ metrics (e.g., confidence) that fail to distinguish the two types of features. We address this problem by proposing a new metric called dual-lift that captures the interaction between features. Based on that, we formulate a constraint pattern mining problem, which is solved by an efficient algorithm that pushes various constraints deeply into the rule mining process. We apply the dual-lift mining formulation to some real world applications and show some interesting results.
更多
查看译文
关键词
data mining,traditional association rule mining,predictive model,constraint pattern mining problem,itemset mining,association rule mining,pattern mining problem,em algorithm,association rule discovery,efficient algorithm,rule mining process,dual-lift mining formulation,novel predictive modeling framework
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要