Optimized Data Sharing with Differential Privacy: A Game-theoretic Approach

semanticscholar(2021)

引用 0|浏览5
暂无评分
摘要
We study and optimize the differentially private learning outcomes from data shared amongst multiple separate dataowners according to a classical privacy versus accuracy trade-off using a game-theoretic approach. A dynamic noncooperative game, with imperfect information, provides this optimal tradeoff, with differentially private models that learn from the data. In this model, we make an optimal choice of privacy budget parameter from the Laplace mechanism, according to pure differential privacy. The data analysis model uses differentially private gradient queries, as privacy aware supervised-machine learning. Then, we use non-cooperative game theory to analyse and optimize the utility-leakage tradeoff to minimize learning loss achieving a unique Nash equilibrium (mutual best response). We quantify the quality of the trained model with a novel method to capture the trade-off between privacy and utility (accuracy). Our novel method uses fixed-point theory in gradient descent learning to predict the contraction mapping of the outcomes. We validate the collaborative learning method applied with our non-cooperative game over a partitioned real financial dataset, demonstrating benefits of sharing data for all data-owners, with significant benefits in social welfare from applying our game. Introduction For the sake of enhancing efficiency and capacity of the internet of things (IoT), edge computing allows data to be transferred and processed at the edge of the network such as at a cloud aggregator or end devices (Dwork and Pappas 2017; Shi et al. 2016). In such networks, data analysis methods using machine learning (ML) can unlock valuable insights for improving revenue or quality-of-service from, potentially proprietary, private datasets (Hunt et al. 2018; Graepel, Lauter, and Naehrig 2012). The shared information from the data owners in a particular IoT network can then contribute to training ML models. Due to the nature of learning, having large high-quality datasets improves the quality of trained ML models in terms of the accuracy of predictions on potentially untested data (Dwork, Roth et al. 2014). The subsequent improvements in quality motivate multiple data owners to share and merge their datasets in order to create larger training Copyright c © 2021, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. datasets for federated training (Li et al. 2017; Konečnỳ et al. 2016). For instance, financial institutions may wish to merge their transaction or lending datasets to improve the quality of trained ML models for fraud detection or computing interest rates. However, this shared information between data owners will inevitably have sensitive data which the owners wish to protect. As such data owners are independent of each other, they will be concerned about their own data safety in a collaborative learning settings. We consider multiple learners aim to train separate privacyaware ML models with similar structures based on their own datasets and differentially private (DP) responses from other learners and private data owners as shown in Figure 1. Each data owner trains a separate ML model and sends the differentially private response to other participants for collaborative learning. This is similar to distributed ML on arbitrary connected graphs. This way, we can extend the results to more general communication structures with the learner not necessarily at the center. Note that the latter configuration where the communication structure among the learner and the data owners is set up as a star graph with the learner at the center has been considered in prior research, e.g. (Wu et al. 2020; Farokhi et al. 2020). In this paper we first use Banach fixed point theory to get a more accurate prediction of the learning loss (Jung 2017). The next challenge is to significantly improve the utilityprivacy trade-off in terms of the quality of the trained ML models; so our results of learning outcome prediction can be used in conjunction with the cost of sharing private data of consumers with the learner (in terms of loss of reputation, legal costs, implementation of privacy-preserving mechanisms, and communication infrastructure). To address this, we establish a game-theoretic framework for modelling interactions across a data market. The learner can compensate the data owners for access to their private data, by essentially paying them for choosing larger privacy budgets(i.e.,more relaxed privacy). After negotiations between the data owners and the learners for setting privacy budgets, the ML models can be trained and tradeoff between learning loss and privacy level. Related Work. Optimization of the trade-off between privacy and utility has been discussed well in the literature (Kalantari, Sankar, and Sarwate 2018; Brenner and Nissim 2010; Ghosh, Roughgarden, and Sundararajan 2012; Gupte and Sundararajan 2010). In a previous paper, the problem of optimizing utility for differential privacy using linear programming has been solved (Bordenabe, Chatzikokolakis, and Palamidessi 2014). In another work (Xu et al. 2015) , the trade-off between data utility and privacy preservation is discussed with respect to how game theory can be used to complete this trade-off. In (Xu et al. 2015) a sequential game model is constructed between data user and data collector followed by backward induction reaching a subgame perfect Nash Equilibrium. In another work, (Xu et al. 2016), the authors focus on the idea of exchanging private information for money or other incentives provided to the data owner by the data collector. And then they discuss how to use game theory to obtain an agreement between the parties involved in this trade. Distributed/Collaborative Privacy-Preserving Machine Learning (ML) has been investigated in (Shokri and Shmatikov 2015; Huang et al. 2018; Zhang, He, and Lee 2018; Zhang and Zhu 2017; Wu et al. 2020). Stochastic gradient descent is utilized in distributed ML models with additive Gaussian/Laplace noise to ensure differential privacy. By appropriately selecting step size in the stochastic gradient descent, the quality of the trained ML model based on the privacy budget can be forecast according to (Wu et al. 2020). In a work by (Jung 2017), authors study the basic gradient descent iterations in ML models from the contraction view of a specific operator with a differentiable objective function. They show how gradient descent can be accelerated in ML models, preserving fixed-points with faster convergence, by the contraction mapping theorem. Contributions In this paper, we evaluate the collaborative learning model with DP from a fixed-point view of linear regression contraction problem. This way, we make a precise prediction of the learning parameter in ML model. We distributedly optimize the trade-off of utility and privacy in collaborative learning using a non-cooperative game, with imperfect information, between multiple data owners. More specifically, this paper makes the following contributions: • We build a non-cooperative game model for these learners to optimally trade-off accuracy and privacy, according to privacy budget and gain with minimised learning loss. • We demonstrate a unique Nash equilibrium for this game, providing a mutual best response in terms of the differentially private shared data and its learning loss. Moreover, this unique Nash equilibrium is demonstrated to be maintained with imperfect information with data owners simultaneously sharing, and learning from, each others’ data. • We use a Banach fixed-point of view on gradients responses to modify the learning algorithm and to evaluate learning iteration speed. • Our numerical tests built on real financial datasets, where each learner is training for an expectation of annual loan rate with a list of users’ data including sensitive information, demonstrate the significant benefits of learning using our game with differentially-private collaborative machine learning. Data owner 1 Data owner 2 Data owner 3 Data owner 4 query
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要