SparkDA - RDD-Based High-Performance Data Anonymization Technique for Spark Platform.

NSS(2019)

引用 6|浏览2
暂无评分
摘要
Recent proposals in data anonymization have mostly been focused around MapReduce, though the advantages of Spark have been well documented. To address this concern, we propose a new novel data anonymization technique for Apache Spark. SparkDA, our proposal, takes the full advantages of innovative Spark features, such as better partition control, in-memory process, and cache management for iterative operations, while providing high data utility with privacy. These are achieved by proposing data anonymization algorithms through Spark’s Resilient Distributed Dataset (RDD). Our data anonymization algorithms are implemented at two main data processing RDD transformations, FlatMapRDD and ReduceByKeyRDD, respectively. Our experimental results show that our proposed approach provides required data privacy and utility levels while providing scalability with high-performance that are essential to many large datasets.
更多
查看译文
关键词
High-performance, Data anonymization, Spark, Big data mining, Privacy and utility
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要