Trail re-identification and unlinkability in distributed databases

Trail re-identification and unlinkability in distributed databases(2006)

引用 29|浏览15
暂无评分
摘要
The term "re-identification" refers to the correct relation of seemingly anonymous data to explicit identifying information, such as the name or address of people who are the subjects of the data. Historically, re-identification methods were evaluated with respect to a single provider's disclosed databases. Imagine that a data holder discloses two databases: one consists of unidentified data, such as DNA sequences and the other contains identifiable data, such as personal names. The databases appear unrelated and, as a result, existing data protection policies certify sufficient protection from re-identification. However, when multiple locations make such releases available, an individual's data can be tracked across locations, resulting in a location-visit footprint, or a trail. Unique trails can be leveraged for re-identification. In this dissertation, traditional notions of privacy are extended to account for trail re-identification. The goals of this dissertation are: (1) to prove data protection policies can and do fail when they rely on ad-hoc strategies and (2) to demonstrate policies can be strengthened with formal computational models for privacy technology. In this work, trails are studied in two principle parts. First, we concentrate on the trail re-identification problem and develop several learning algorithms for discovering re-identifications. The algorithms are evaluated on populations derived from real world databases, including hospital visits derived from medical databases and weblogs derived from Internet databases. It is demonstrated that susceptibility to trail re-identification is neither trivial nor the result of bizarre isolated occurrences. Experimental evidence with real world populations confirms that significant quantities of populations are at trail re-identification risk. Second, we propose a protocol by which data holders can collaborate to provably prevent trail re-identification. To do so, we introduce a formal model of privacy called k-unlinkability, and several configurable algorithms to render protected trails. To satisfy real world policy constraints, we present a novel secure multiparty computation protocol that embeds the protection procedure. Using real world datasets, it is demonstrated that significant quantities of data can be disclosed with provable privacy guarantees.
更多
查看译文
关键词
unidentified data,data protection policy,anonymous data,trail re-identification,identifiable data,re-identification method,trail re-identification risk,data holder,trail re-identification problem,significant quantity
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要