Maximally informative k-itemsets and their efficient discovery.
KDD(2006)
摘要
ABSTRACTIn this paper we present a new approach to mining binary data. We treat each binary feature (item) as a means of distinguishing two sets of examples. Our interest is in selecting from the total set of items an itemset of specified size, such that the database is partitioned with as uniform a distribution over the parts as possible. To achieve this goal, we propose the use of joint entropy as a quality measure for itemsets, and refer to optimal itemsets of cardinality k as maximally informative k-itemsets. We claim that this approach maximises distinctive power, as well as minimises redundancy within the feature set. A number of algorithms is presented for computing optimal itemsets efficiently.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络