A divide-and-conquer approach to privacy-preserving high-dimensional big data release

Journal of Information Security and Applications(2024)

Cited 0|Views5
No score
Data anonymization has been used extensively in data-sharing scenarios to protect the privacy of people’s raw data. However, in the era of Big Data, the amount of data released has increased so rapidly that most existing data anonymization approaches have become ineffective. This is because the scalability of these approaches is inadequate when dealing with large-scale data. In addition, these approaches cannot handle the sparseness of high-dimensional search space. In this paper, we propose a MapReduce-based approach to address the problem of anonymization of high-dimensional big data. First, our approach uses a vertical partition criterion based on normalized mutual information to decompose raw data into different fragments with smaller dimensionality. Then, a clustering-based local recoding is used to group the records of each fragment into clusters. During this phase, records with similar values of quasi-identifier attributes but dissimilar values of sensitive attributes tend to be grouped. Finally, clusters of each fragment are anonymized to resist simultaneously (1) the disclosure of the individual identification and (2) proximity breaches. Our proposed approach is integrated with MapReduce to implement parallel distributed computing. Experiments on three public data sets demonstrated that our approach outperformed the compared approaches in terms of efficiency and scalability.
Translated text
Key words
Data anonymization,High-dimensional big data,Vertical partition,Local recoding,MapReduce
AI Read Science
Must-Reading Tree
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined