An Entropy-Based Analytic Model For The Privacy-Preserving In Open Data

2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)(2016)

Cited 25|Views58
No score
Abstract
In a Big Data era, a lot of open data set is published and shared with the public. That creates new services and business. However, the publication may cause a leakage problem of private information. In general, de-identification techniques are applied to the data before publication. The problem, however, has not been solved completely. Personal data can be obtained from the several sources such as Internet service and social media. In this situation, a de-identified open data may be simply joined with the leaked external data and it may result in a re-identification issue. We propose a new analytic model to measure the personal information leakage risk in the open data before publishing. The proposed model formulates the entropy-based re-identification risk to measure the privacy leakage risk. We also try to find the data utility measure by using the entropy while preserving the privacy. Based on both the risk and the utility measure, we propose the guideline for data open to the public. We show the guideline including the risk and utility measurement can be applicable with the empirical experiments.
More
Translated text
Key words
privacy,open data,re-identification,information entropy,privacy preserving data mining
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined