The Boundary Between Privacy and Utility in Data Anonymization

Clinical Orthopaedics and Related Research(2006)

引用 25|浏览7
暂无评分
摘要
We consider the privacy problem in data publishing: given a relation I containing sensitive information "anonymize" it to obtain a view V such that, on one hand attackers cannot learn any sensitive information from V , and on the other hand legitimate users can use V to compute useful statistics on I. These are conflicting goals. We use a definition of privacy that is derived from existing ones in the literature, which relates the a priori probability of a given tuple t, P r(t), with the a posteriori probability, P r(t|V ), and propose a novel and quite practical definition for utility. Our main result is the following. Denoting n the size of I and m the size of the domain from which I was drawn (i.e. n < m) then: when the a priori probability is P r(t) = (n/ p m) for some tuples t there exists no useful anonymization algorithm, while when P r(t) = O(n/m) for all tuples t then we give a concrete anonymization algorithm that is both private and useful. Our algorithm is quite different from the k-anonymization algorithm studied intensively in the literature, and is based on random deletions and insertions to I.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要