Anonymizing Tables for Privacy Protection

msra

引用 23|浏览27
暂无评分
摘要
We consider the problem of releasing tables containing personal records while ensuring individual pri- vacy and data integrity. One of the techniques proposed in the literature is k-anonymization. A release is considered k-anonymous if the information for each person contained in the release cannot be distin- guished from at least k 1 other persons whose information also appears in the release. We show that the problem of k-anonymization is NP-hard even when the attribute values are ternary. On the positive side, we give an O(k)-approximation algorithm for the problem. This improves upon the previous best known O(klogk)-approximation (MW04). In addition, we give a 1.5-approximation algorithm for the special case of 2-anonymity, and a 2-approximation algorithm for 3-anonymity. The information age has witnessed a huge growth in the amount of personal data that can be collected and analyzed. This has led to an increasing use of data mining tools with the basic goal of inferring trends in order to predict the future. However, this goal conflicts with the desire for privacy of personal data. In many scenarios, access to large amounts of personal data is essential in order for accurate inferences to be drawn. For example, hospitals might wish to collaborate in order to catch the outbreak of epidemics in its early stages. This requires them to allow access to medical records of their patients. In such cases, one would like to provide data in a manner that enables one to draw inferences without violating the privacy of individual records. One approach is to suppress some of the sensitive data values. This ensures complete data integrity, i.e., inferences can be made with 100% confidence (as compared to perturbation techniques). We study the k-anonymity model which was proposed by Samarati and Sweeney (Swe02, SS98). Consider a database with n rows and m columns in which each entry comes from a finite alphabet
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要