The Geometry of Differential Privacy: The Small Database and Approximate Cases.

SIAM JOURNAL ON COMPUTING(2016)

引用 28|浏览168
暂无评分
摘要
In this work, we study trade-offs between accuracy and privacy in the context of linear queries over histograms. This is a rich class of queries that includes contingency tables and range queries and has been a focus of a long line of work. For a given set of d linear queries A over a database x is an element of R-N, we seek to find the differentially private mechanism that has the minimum mean squared error relative to the true answer Ax. For pure differential privacy, Hart and Talwar and Bhaskara et al. give an O(log(2) d) approximation to the optimal mechanism. Our first contribution is to give an O(log(2) d) approximation guarantee for the case of (epsilon, delta)-approximate differential privacy. Our mechanism adds carefully chosen correlated Gaussian noise to the answers and runs in time polynomial in d and N. The mechanism is based on a recursive construction of a "small" enclosing ellipsoid of the sensitivity polytope AB(1)(N), where B-1(N) denotes the N-dimensional l(1) ball. Optimality is established relative to the spectral lower bound, through a variant of the classical Bourgain-Tzafriri restricted invertibility theorem. We next consider this question in the case when the number of individuals n = parallel to x parallel to(1) in the database is smaller than the number of queries d. We call this the small database setting. Since the work of Blum, Ligett, and Roth [STOC '08: Proceedings of the 40th Annual ACM Symposium on Theory of Computing, 2008, pp. 609-618.] it has been known that such a restriction can allow for mechanisms with lower error. Our second main contribution is to give, for both pure and (epsilon, delta)-privacy, differentially private mechanisms which have mean squared error within polylog(d, N) of the optimal for any A and n. The approximation is achieved by first applying the mechanism without considering the size of the database and then projecting the answer to the convex hull of feasible answers space nAB(1)(N), which is guaranteed to contain the true answer Ax. By applying techniques from statistical estimation, we show such mechanisms can achieve the claimed approximation. When applied to counting queries, i.e., A with entries in {0, 1}, our mechanism achieves (O) over tilde(root n) average error for both pure and (epsilon, delta)-privacy. (O) over tilde hides polylogarithmic factors in all the parameters such as d, n, N, and 1/delta. Previously, such a bound was only known for (epsilon, delta)-privacy. For pure privacy, this bound improves upon the (O) over tilde (n 2/3) bound of Blum, Ligett, and Roth and matches the lower bound implied by I. Dinur and K. Nissim, Proceedings of the 22nd PODS, ACM, 2003, pp. 202-210 up to logarithmic factors. The third question we consider is the accuracy gap between the pure and approximate privacy notions. By applying tools from convex geometry, we make a connection between the hereditary discrepancy lower bound of S. Muthukrishnan and A. Nikolov, STOC '12: Proceedings of the 44th Symposium on Theory of Computing, 2012, pp. 1285-1292 for (epsilon, delta)-privacy and the volume lower bound of Hardt and Talwar for pure privacy. We are able to show that the gap in error between the optimal mechanisms satisfying pure and (epsilon, delta)-differential privacy is bounded by a factor of O(polylog(d, N)). The connection between hereditary discrepancy and private mechanisms also enables us to derive the first polylogarithmic approximation to the hereditary discrepancy of a matrix A.
更多
查看译文
关键词
differential privacy,statistical estimation,convex geometry,combinatorial discrepancy
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要