Assessing Data Intrusion Threats Response
SCIENCE(2015)
摘要
Barth-Jones ).
A simple and real example of our attack model is a bank sharing metadata for its 1.1 million customers in anonymized form
with a third party for analysis. If the third party is able to obtain additional information—such as loyalty program data
if the third party is a retailer—that data could be used to reidentify an individual and all the rest of his or her purchases.
Barth-Jones et al.'s Letter exemplifies the intrinsic issue with deidentification. One can always, as Barth-Jones et al. have, artificially lower the estimated likelihood of reidentification through the use of arbitrary and debatable assumptions.
First, Barth-Jones ). This is an unrealistic definition of breach of privacy. Second, Barth-Jones et al. assume that it is “very unlikely” for an attacker to be able to collect geolocalized information about an individual. At
best, this is a striking underestimation of the current availability of identified data. Possible sources would include manually
collected clues about an individual we know (e.g., receipts or branded shopping bags) (); having access or collecting from public profiles people's check-ins at shops or restaurants on Yelp, Foursquare, or Facebook
(); or having access to a retailer's database or to a database of geolocalized information such as the one collected by smartphone
applications (), WiFicompanies, and virtually any carriers in the world. Third, Barth-Jones et al. assume that an attacker cannot know whether an individual is a client of a bank and is therefore in the data set. This is
again an assumption that artificially lowers the estimated, and thus perceived, risks of reidentification without changing
at all the actual risk for people in the release data set. Fourth, the fact that an individual might occasionally pay cash
only means that an attacker would need a few more points.
Estimated probabilities of reidentification are not a useful basis for policy, and we stand by our comment that “the open
sharing of raw [deidentified metadata] data sets is not the future” ().
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络