Learning Semantically Robust Rules from Data

msra(2004)

引用 23|浏览8
暂无评分
摘要
We introduce the problem of mining robust rules, which are expressive multi-dimensional generalized association rules. Consider a large relational table, where associated with each attribute is a hierarchy whose base values are those originally represented in the data, and values appearing at higher levels in the hierarchy represent increasingly more general concepts of base values. Attribute hierarchies provide meaningful levels of concept aggregation, such as the encoding of postal codes (ZIP) or dates, or the taxonomy of products. We find the least general rules formed by combining mixed levels of generalizations across attributes to convey the maximum expression of information supported by attribute hierarchies, parameter settings and data tuples. We term these "robust rules" and introduce a GenTree algorithm as a means to learn robust rules from a table. An example of a robust rule from a table having base values {5-digit ZIP, gender, registration date (year/month/day), party} might be "women living in Cambridge (021**) and registered in the 1970's (197*/xx/xx) tend to be Democrats." Previous studies on mining generalized association rules have been limited dimensionally (e.g., transactional data), by data type (e.g., quantitative data), and/or to rules expressed from either fixed-level or non-semantic abstractions. Such approaches limit the kinds of rules that can be learned. Experiments using GenTree with two real-world datasets, containing 10,000 six-attributed tuples and over 4,000 eight-attributed tuples each, show that learned rules convey more comprehensive information than possible with traditional association rule mining algorithms, because traditional approaches limit the expressivity of the rules they generate.
更多
查看译文
关键词
association rule mining,association rules,data mining,data type,transaction data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要