The Boundary Between Privacy and Utility in Data Publishing.

Very Large Data Bases(2007)

引用 323|浏览49
暂无评分
摘要
We consider the privacy problem in data publishing: given a database instance containing sensitive information anonymize it to obtain a view such that, on one hand attackers cannot learn any sensitive information from the view, and on the other hand legitimate users can use it to compute useful statistics. These are conflicting goals. In this paper we prove an almost crisp separation of the case when a useful anonymization algorithm is possible from when it is not, based on the attacker's prior knowledge. Our definition of privacy is derived from existing literature and relates the attacker's prior belief for a given tuple t, with the posterior belief for the same tuple. Our definition of utility is based on the error bound on the estimates of counting queries. The main result has two parts. First we show that if the prior beliefs for some tuples are large then there exists no useful anonymization algorithm. Second, we show that when the prior is bounded for all tuples then there exists an anonymization algorithm that is both private and useful. The anonymization algorithm that forms our positive result is novel, and improves the privacy/utility tradeoff of previously known algorithms with privacy/utility guarantees such as FRAPP.
更多
查看译文
关键词
anonymization algorithm,prior belief,useful anonymization algorithm,sensitive information,prior knowledge,privacy problem,useful statistic,utility tradeoff,hand attacker,hand legitimate user,data publishing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要