# The Discrete Gaussian for Differential Privacy

NeurIPS 2020, 2020.

Keywords:

private systemgaussian distributionempirical risk minimizationdiscrete gaussianefficient algorithmMore(6+)

Weibo:

Abstract:

We show how to efficiently provide differentially private answers to counting queries (or integer-valued low-sensitivity queries) by adding discrete Gaussian noise, with essentially the same privacy and accuracy as the continuous Gaussian. The use of a discrete distribution is necessary in practice, as finite computers cannot represent ...More

Code:

Data:

Introduction

- Differential Privacy [DMNS06] provides a rigorous standard for ensuring that the output of an algorithm does not leak the private details of individuals contained in its input.
- The authors prove the main result on concentrated differential privacy (CDP), showing that the discrete Gaussian provides the same CDP guarantees as the continuous one.
- Theorem 4 (Discrete Gaussian Satisfies Concentrated Differential Privacy).

Highlights

- Differential Privacy [DMNS06] provides a rigorous standard for ensuring that the output of an algorithm does not leak the private details of individuals contained in its input
- Our investigations focus on three aspects of the discrete Gaussian: privacy, utility, and sampling
- We demonstrate that the discrete Gaussian provides the same level of privacy and utility as the continuous Gaussian
- We show that it can be efficiently sampled on a finite computer, addressing the shortcomings of continuous distributions discussed earlier
- In Theorem 4, we show that adding noise drawn from NZ 0, 1/ε2 to an integer-valued sensitivity-1 query concentrated differential privacy
- We develop new tools for converting concentrated differential privacy guarantees into approximate differential privacy guarantees, which are not specific to the discrete Gaussian and are of independent interest

Results

- It is worth noting that the continuous Gaussian satisfies the same concentrated differential privacy bound, with equality for all Renyi divergence parameters: Dα
- The authors prove the main result on approximate differential privacy; namely, a tight bound on the privacy parameters achieved by the discrete Gaussian.
- Theorem 7 (Discrete Gaussian Satisfies Approximate Differential Privacy).
- Let M : X n → Y be a randomized algorithm satisfying ρ-concentrated differential privacy.
- To prove Proposition 19 the authors use the following lemma which relates upper bounds on the variance of a discrete Gaussian to lower bounds on it, and vice-versa.
- Figure 4 demonstrates that the discrete Gaussian provides a better privacy-utility tradeoff than the discrete Laplace, except in two narrow parameter regimes: Either a small number of queries or if the authors demand something very close to pure differential privacy.
- The second, to efficiently and exactly sample from a discrete Laplace with scale parameter t, for any positive integer t (Proposition 32).
- On input γ ≥ 0, the procedure described in Algorithm 1 outputs on sample from Bernoulli(exp(−γ)), and requires a constant number of operations in expectation.
- T ∈ Z with s, t ≥ 1, the procedure described in Algorithm 2 outputs one sample from LapZ(t/s), and requires a constant number of operations in expectation.

Conclusion

- The authors must verify two things: (1) the authors must show that, for each attempt, conditioned on outputting a value Z, the distribution of the output Z is LapZ(t/s) as desired; (2) the authors must lower bound the probability that a given loop iteration is successful.
- It is immediate to see that, as defined by the algorithm, V is independent of both U and D and follows a geometric distribution with parameter 1 − e−1: that is, P [V = k] = (1 − e−1) · e−k for every integer k ≥ 0.
- Loops and the number of times each of them runs is bounded by a geometric distribution.[9] This means that, for any δ ∈ (0, 1), to generate k samples from NZ 0, σ2 (i.e., k runs of Algorithm 3), the probability of requiring more than O(k + log(1/δ)) operations is at most δ.

Summary

- Differential Privacy [DMNS06] provides a rigorous standard for ensuring that the output of an algorithm does not leak the private details of individuals contained in its input.
- The authors prove the main result on concentrated differential privacy (CDP), showing that the discrete Gaussian provides the same CDP guarantees as the continuous one.
- Theorem 4 (Discrete Gaussian Satisfies Concentrated Differential Privacy).
- It is worth noting that the continuous Gaussian satisfies the same concentrated differential privacy bound, with equality for all Renyi divergence parameters: Dα
- The authors prove the main result on approximate differential privacy; namely, a tight bound on the privacy parameters achieved by the discrete Gaussian.
- Theorem 7 (Discrete Gaussian Satisfies Approximate Differential Privacy).
- Let M : X n → Y be a randomized algorithm satisfying ρ-concentrated differential privacy.
- To prove Proposition 19 the authors use the following lemma which relates upper bounds on the variance of a discrete Gaussian to lower bounds on it, and vice-versa.
- Figure 4 demonstrates that the discrete Gaussian provides a better privacy-utility tradeoff than the discrete Laplace, except in two narrow parameter regimes: Either a small number of queries or if the authors demand something very close to pure differential privacy.
- The second, to efficiently and exactly sample from a discrete Laplace with scale parameter t, for any positive integer t (Proposition 32).
- On input γ ≥ 0, the procedure described in Algorithm 1 outputs on sample from Bernoulli(exp(−γ)), and requires a constant number of operations in expectation.
- T ∈ Z with s, t ≥ 1, the procedure described in Algorithm 2 outputs one sample from LapZ(t/s), and requires a constant number of operations in expectation.
- The authors must verify two things: (1) the authors must show that, for each attempt, conditioned on outputting a value Z, the distribution of the output Z is LapZ(t/s) as desired; (2) the authors must lower bound the probability that a given loop iteration is successful.
- It is immediate to see that, as defined by the algorithm, V is independent of both U and D and follows a geometric distribution with parameter 1 − e−1: that is, P [V = k] = (1 − e−1) · e−k for every integer k ≥ 0.
- Loops and the number of times each of them runs is bounded by a geometric distribution.[9] This means that, for any δ ∈ (0, 1), to generate k samples from NZ 0, σ2 (i.e., k runs of Algorithm 3), the probability of requiring more than O(k + log(1/δ)) operations is at most δ.

Related work

- As originally observed and demonstrated by Mironov [Mir12], naıve implementations of the Laplace mechanism with floating-point arithmetic blatantly fail to ensure differential privacy, or any form of privacy at all. As a remedy, Mironov introduced the snapping mechanism, which serves as a safe replacement for the Laplace mechanism in the floating-point setting. The snapping mechanism performs rounding and truncation on top of the floating-point arithmetic. However, properly implementing and analyzing the snapping mechanism can be involved [Cov19], due to the idiosyncrasies of floating-point arithmetic. Furthermore, the snapping mechanism requires a compromise on privacy and accuracy, relative to what is theoretically achievable. Our methods avoid floating-point arithmetic entirely and do not compromise the privacy or accuracy guarantees.

Funding

- As the number of queries increases, we see that the discrete Laplace requires higher variance; for k = 100, the variance is 69% more

Study subjects and analysis

samples: 1000

We have implemented the algorithms from Algorithms 1, 2, and 3 in Python (using the fractions.Fraction class for exact rational arithmetic and using random.SystemRandom() to obtain high-quality randomnesss). Overall, on a standard personal computer, our basic (nonoptimized) implementation is able to produce over 1000 samples per second even for σ2 = 10100.

Reference

- J. M. Abowd. Disclosure Avoidance for Block Level Data and Protection of Confidentiality in Public Tabulations. https://www2.census.gov/cac/sac/meetings/2018-12/abowd-disclosure-avoidance.pdf.2018.
- M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang. “Deep learning with differential privacy”. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. 2016, pp. 308–318.
- S. Asoodeh, J. Liao, F. P. Calmon, O. Kosut, and L. Sankar. “A Better Bound Gives a Hundred Rounds: Enhanced Privacy Guarantees via f -Divergences”. In: arXiv preprint arXiv:2001.05990 (2020).
- D. Aggarwal and O. Regev. “A note on discrete Gaussian combinations of lattice vectors”. In: Chic. J. Theoret. Comput. Sci. (2016), Art. 7, 11. url: https://doi.org/10.4086/cjtcs.2016.007.
- N. Agarwal, A. T. Suresh, F. X. X. Yu, S. Kumar, and B. McMahan. “cpSGD: Communication-efficient and differentially-private distributed SGD”. In: Advances in Neural Information Processing Systems. 2018, pp. 7564–7575.
- M. Bun, C. Dwork, G. N. Rothblum, and T. Steinke. “Composable and versatile privacy via truncated CDP”. In: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing. 2018, pp. 74–86.
- M. Bun and T. Steinke. “Concentrated differential privacy: Simplifications, extensions, and lower bounds”. In: Theory of Cryptography Conference. Springer. 2016, pp. 635–658.
- R. Bassily, A. Smith, and A. Thakurta. “Private empirical risk minimization: Efficient algorithms and tight error bounds”. In: 2014 IEEE 55th Annual Symposium on Foundations of Computer Science. IEEE. 2014, pp. 464–473.
- V. Balcer and S. Vadhan. “Differential privacy on finite computers”. In: arXiv preprint arXiv:1709.05396 (2017).
- B. Balle and Y.-X. Wang. Improving the Gaussian Mechanism for Differential Privacy: Analytical Calibration and Optimal Denoising. 2018. arXiv: 1805.06530 [cs.LG].
- Census. https://github.com/uscensusbureau/census2020-das-e2e/blob/master/programs/engine/primitives.py.2018.
- C. Covington. “Snapping Mechanism Notes”. In: (2019). https://github.com/ctcovington/floating_point/blob/master/snapping_mechanism/notes/snapping_implementation_notes.pdf.
- DiffPrivLib. https://github.com / IBM/differential-privacy-library/blob/master/diffprivlib/mechanisms/laplace.py.2019.
- C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor. “Our data, ourselves: Privacy via distributed noise generation”. In: Annual International Conference on the Theory and Applications of Cryptographic Techniques. Springer. 2006, pp. 486–503.
- C. Dwork, F. McSherry, K. Nissim, and A. Smith. “Calibrating Noise to Sensitivity in Private Data Analysis”. In: Proceedings of the Third Conference on Theory of Cryptography. TCC’06. New York, NY: Springer-Verlag, 2006, pp. 265–284. isbn: 3-540-32731-2, 978-3-540-32731-8. url: http://dx.doi.org/10.1007/11681878_14.
- I. Dinur and K. Nissim. “Revealing information while preserving privacy”. In: Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. 2003, pp. 202–210.
- C. Dwork and G. N. Rothblum. “Concentrated differential privacy”. In: arXiv preprint arXiv:1603.01887 (2016).
- J. Dong, A. Roth, and W. J. Su. Gaussian Differential Privacy. 2019. arXiv: 1905.02383 [cs.LG].
- C. Dwork, G. N. Rothblum, and S. Vadhan. “Boosting and differential privacy”. In: 2010 IEEE 51st Annual Symposium on Foundations of Computer Science. IEEE. 2010, pp. 51–60.
- G. E. Forsythe. “Von Neumann’s comparison method for random sampling from the normal and other distributions”. In: Mathematics of Computation 26.120 (1972), pp. 817–826.
- I. Gazeau, D. Miller, and C. Palamidessi. “Preserving differential privacy under finite-precision semantics”. In: Theoretical Computer Science 655 (2016), pp. 92– 108.
- Google. https://github.com/google/differential-privacy/blob/master/differential_privacy/algorithms/distributions.cc.2020.
- C. Gentry, C. Peikert, and V. Vaikuntanathan. “Trapdoors for hard lattices and new cryptographic constructions”. In: STOC. ACM, 2008, pp. 197–206.
- A. Ghosh, T. Roughgarden, and M. Sundararajan. “Universally utility-maximizing privacy mechanisms”. In: SIAM Journal on Computing 41.6 (2012), pp. 1673– 1693.
- C. Ilvento. “Implementing the Exponential Mechanism with Base-2 Differential Privacy”. In: arXiv preprint arXiv:1912.04222 (2019).
- C. F. Karney. “Sampling exactly from the normal distribution”. In: ACM Transactions on Mathematical Software (TOMS) 42.1 (2016), pp. 1–14.
- Y.-H. Kuo, C.-C. Chiu, D. Kifer, M. Hay, and A. Machanavajjhala. “Differentially private hierarchical count-of-counts histograms”. In: Proceedings of the VLDB Endowment 11.11 (2018), pp. 1509–1521.
- P. Kairouz, S. Oh, and P. Viswanath. “The composition theorem for differential privacy”. In: IEEE Transactions on Information Theory 63.6 (2017), pp. 4037– 4049.
- J. Liu. “Information theory from a functional viewpoint”. PhD thesis. Princeton University, 2018.
- I. Mironov. “On Significance of the Least Significant Bits for Differential Privacy”. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security. CCS ’12. Raleigh, North Carolina, USA: Association for Computing Machinery, 2012, 650–661. isbn: 9781450316514. url: https://doi.org/10.1145/2382196.2382264.
- I. Mironov. “Renyi differential privacy”. In: 2017 IEEE 30th Computer Security Foundations Symposium (CSF). IEEE. 2017, pp. 263–275.
- J. Murtagh and S. Vadhan. “The complexity of computing the optimal composition of differential privacy”. In: Theory of Cryptography Conference. Springer. 2016, pp. 157–175.
- C. Peikert. “An efficient and parallel Gaussian sampler for lattices”. In: Advances in cryptology—CRYPTO 2010. Vol. 6223. Lecture Notes in Comput. Sci. Springer, Berlin, 2010, pp. 80–97. url: https://doi.org/10.1007/978-3642-14623-7_5.
- Poisson Summation Formula. Proof Wiki. url: https://proofwiki.org/wiki/ Poisson_Summation_Formula.
- O. Regev. “On lattices, learning with errors, random linear codes, and cryptography”. In: Journal of the ACM (JACM) 56.6 (2009), pp. 1–40.
- O. Rivasplata. “Subgaussian random variables: An expository note”. In: (2012). http://www.stat.cmu.edu/~arinaldo/36788/subgaussians.pdf. N. Stephens-Davidowitz.“On the Gaussian measure over lattices”. PhD Thesis. New York University, 2017.
- I. Sason and S. Verdu. “f -Divergence Inequalities”. In: IEEE Transactions on Information Theory 62.11 (2016), 5973–6006. issn: 1557-9654. url: http://dx.doi.org/10.1109/TIT.2016.2603151.
- A. Schein, Z. S. Wu, A. Schofield, M. Zhou, and H. Wallach. “Locally Private Bayesian Inference for Count Models”. In: Proceedings of the 36th International Conference on Machine Learning. Ed. by K. Chaudhuri and R. Salakhutdinov. Vol. 97. Proceedings of Machine Learning Research. Long Beach, California, USA: PMLR, 2019, pp. 5638–5648. url: http://proceedings.mlr.press/v97/schein19a.html.
- J. Von Neumann. “13. various techniques used in connection with random digits”. In: Appl. Math Ser 12.36-38 (1951), p. 5.
- E. W. Weisstein. Poisson Sum Formula. MathWorld–A Wolfram Web Resource. url: https://mathworld.wolfram.com/PoissonSumFormula.html.
- WhiteNoise. https://github.com/opendifferentialprivacy/whitenoisecore/issues/242.2020.
- R. K. Zhao, R. Steinfeld, and A. Sakzad. COSAC: COmpact and Scalable ArbitraryCentered Discrete Gaussian Sampling over Integers. Cryptology ePrint Archive, Report 2019/1011. https://eprint.iacr.org/2019/1011.2019.

Tags

Comments