PACAS: Privacy-Aware, Data Cleaning-as-a-Service
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)(2018)
摘要
Data cleaning consumes up to 80% of the data analysis pipeline. This is a significant overhead for organizations where data cleaning is still a manually driven process requiring domain expertise. Recent advances have fueled a new computing paradigm called Database-as-a-Service, where data management tasks are outsourced to large service providers. We propose a new Data Cleaning-as-a-Service model that allows a client to interact with a data cleaning provider who hosts curated, and sensitive data. We present PACAS: a Privacy-Aware data Cleaning-As-a-Service framework that facilitates communication between the client and the service provider via a data pricing scheme where clients issue queries, and the service provider returns clean answers for a price while protecting her data. We propose a practical privacy model in such interactive settings called (X,Y,L)-anonymity that extends existing data publishing techniques to consider the data semantics while protecting sensitive values. Our evaluation over real data shows that PACAS effectively safeguards semantically related sensitive values, and provides improved accuracy over existing privacy-aware cleaning techniques.
更多查看译文
关键词
data quality, data cleaning, data privacy
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络