Bottom-Up Generalization: A Data Mining Solution to Privacy Protection

ICDM(2004)

引用 472|浏览417
暂无评分
摘要
The well-known privacy-preserved data mining modifies existing data mining techniques to randomized data. In this paper, we investigate data mining as a technique for masking data, therefore, termed data mining based privacy protection. This approach incorporates partially the requirement of a targeted data mining task into the process of masking data so that essential structure is preserved in the masked data. The idea is simple but novel: we explore the data generalization concept from data mining as a way to hide detailed information, rather than discover trends and patterns. Once the data is masked, standard data mining techniques can be applied without modification. Our work demonstrated another positive use of data mining technology: not only can it discover useful patterns, but also mask private information. We consider the following privacy problem: a data holder wants to release a version of datafor building classification models, but wants to protect against linking the released data to an external source for inferring sensitive information. We adapt an iterative bottom-up generalization from data mining to generalize the data. The generalized data remains useful to classification but becomes difficult to link to other sources. The generalization space is specified by a hierarchical structure of generalizations. A key is identifying the best generalization to climb up the hierarchy at each iteration. Enumerating all candidate generalizations is impractical. We present a scalable solution that examines at most one generalization in each iteration for each attribute involved in the linking.
更多
查看译文
关键词
data generalization concept,data mining,data mining solution,bottom-up generalization,data mining technology,randomized data,privacy protection,data mining technique,masking data,targeted data mining task,generalized data,standard data mining technique,data holder,private information,bottom up,data privacy
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要