An Entropy-Based Analytic Model For The Privacy-Preserving In Open Data

2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)(2016)

引用 25|浏览56
暂无评分
摘要
In a Big Data era, a lot of open data set is published and shared with the public. That creates new services and business. However, the publication may cause a leakage problem of private information. In general, de-identification techniques are applied to the data before publication. The problem, however, has not been solved completely. Personal data can be obtained from the several sources such as Internet service and social media. In this situation, a de-identified open data may be simply joined with the leaked external data and it may result in a re-identification issue. We propose a new analytic model to measure the personal information leakage risk in the open data before publishing. The proposed model formulates the entropy-based re-identification risk to measure the privacy leakage risk. We also try to find the data utility measure by using the entropy while preserving the privacy. Based on both the risk and the utility measure, we propose the guideline for data open to the public. We show the guideline including the risk and utility measurement can be applicable with the empirical experiments.
更多
查看译文
关键词
privacy, open data, re-identification, information entropy, privacy preserving data mining
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要