## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Private Adaptive Gradient Methods For Convex Optimization

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, (2021): 383-392

EI

Keywords

Abstract

We study adaptive methods for differentially private convex optimization, proposing and analyzing differentially private variants of a Stochastic Gradient Descent (SGD) algorithm with adaptive stepsizes, as well as the AdaGrad algorithm. We provide upper bounds on the regret of both algorithms and show that the bounds are (worst-case) opt...More

Code:

Data:

Highlights

- While the success of stochastic gradient methods for solving empirical risk minimization has motivated their adoption across much of machine learning, increasing privacy risks in data-intensive tasks have made applying them more challenging [DMNS06]: gradients can leak users’ data, intermediate models can compromise individuals, and even final trained models may be non-private without substantial care
- We introduce Private Adagrad with Adaptive Noise (Pagan) (Private AdaGrad with Adaptive Noise), a new differentially private variant of stochastic gradient descent and AdaGrad
- By carefully adapting noise to the actual gradients at hand, we can both achieve convergence rates that reflect the observed magnitude of the gradients—similar to the approach of Bartlett et al [BHR07] in the non-private case—which can yield marked improvements over the typical guarantees that depend on worst-case magnitudes. (Think, for example, of a standard normal variable: its second moment is 1, while its maximum value is unbounded.) we propose a new private adaptive optimization algorithm that analogizes AdaGrad, showing that under certain natural distributional assumptions for the problems—similar to those that separate AdaGrad from non-adaptive methods [LD19]—our private versions of adaptive methods significantly outperform the standard non-adaptive private algorithms
- Since the introduction of differential privacy [DMNS06; DKMMN06], differentially private empirical risk minimization has been a subject of intense interest [CMS11; BST14; DJW13; ST13a]
- Zhou et al [ZWB20] propose a private Stochastic Gradient Descent (SGD) algorithm where the gradients are projected to a lowdimensional subspace—which is learned using public data—and Kairouz et al [KRRT20] developed an ε-differentially private variant of Adagrad which projects the gradient to a low rank subspace

- Table1: Test perplexity error of different methods. For reference, non-private SGD and AdaGrad (without clipping) achieve 75.45 and 79.74, respectively

Related work

- Since the introduction of differential privacy [DMNS06; DKMMN06], differentially private empirical risk minimization has been a subject of intense interest [CMS11; BST14; DJW13; ST13a]. The current standard approach to solving this problem is noisy SGD [BST14; DJW13; ACGMMTZ16; BFTT19; FKT20]. Current bounds focus on the standard Euclidean geometries familiar from classical analyses of gradient descent [Zin03; NJLS09], and the prototypical result [BST14; BFTT19]

is that, for Lipschitz convex optimization problems on th√e 2-ball in d-dimensions, an ε-differentially private version of

SGD achieves excess empirical loss O( d nε )

given a sample of size n; this is minimax optimal. Similar bounds also hold for other geometries ( p-balls for 1 ≤ p ≤ 2) using noisy mirror descent [AFKT21]. Alternative approaches use the stability of empirical risk minimizers of (strongly) convex functions, and include both output perturbation, where one adds noise to a regularized empirical minimizer, and objective perturbation, where one incorporates random linear noise in the objective function before optimization [CMS11].

Given the success of private SGD for such Euclidean cases and adaptive gradient algorithms for modern large-scale learning, it is unsurprising that recent work attempts to incorporate adaptivity into private empirical risk minimization (ERM) algorithms [ZWB20; KRRT20]. In this vein, Zhou et al [ZWB20] propose a private SGD algorithm where the gradients are projected to a lowdimensional subspace—which is learned using public data—and Kairouz et al [KRRT20] developed an ε-differentially private variant of Adagrad which (simil√arly) projects the gradient to a low rank subspace.

Funding

- By carefully adapting noise to the actual gradients at hand, we can both achieve convergence rates that reflect the observed magnitude of the gradients—similar to the approach of Bartlett et al [BHR07] in the non-private case—which can yield marked improvements over the typical guarantees that depend on worst-case magnitudes. (Think, for example, of a standard normal variable: its second moment is 1, while its maximum value is unbounded.) Moreover, we propose a new private adaptive optimization algorithm that analogizes AdaGrad, showing that under certain natural distributional assumptions for the problems—similar to those that separate AdaGrad from non-adaptive methods [LD19]—our private versions of adaptive methods significantly outperform the standard non-adaptive private algorithms

Reference

- [ACGMMTZ16] M. Abadi, A. Chu, I. Goodfellow, B. McMahan, I. Mironov, K. Talwar, and L. Zhang. “Deep Learning with Differential Privacy”. In: 23rd ACM Conference on Computer and Communications Security (ACM CCS). 2016, pp. 308–318.
- H. Asi and J. C. Duchi. “Stochastic (Approximate) Proximal Point Methods: Convergence, Optimality, and Adaptivity”. In: SIAM Journal on Optimization 29.3 (2019), pp. 2257–2290. url: https://arXiv.org/abs/1810.05633.
- H. Asi, V. Feldman, T. Koren, and K. Talwar. “Private Stochastic Convex Optimization: Optimal Rates in 1 Geometry”. In: arXiv:2103.01516 [cs.LG] (2021).
- R. F. Barber and J. C. Duchi. “Privacy and Statistical Risk: Formalisms and Minimax Bounds”. In: arXiv:1412.4451 [math.ST] (2014).
- R. Bassily, V. Feldman, K. Talwar, and A. Thakurta. “Private Stochastic Convex Optimization with Optimal Rates”. In: Advances in Neural Information Processing Systems 32. 2019.
- P. L. Bartlett, E. Hazan, and A. Rakhlin. “Adaptive Online Gradient Descent”. In: Advances in Neural Information Processing Systems 20. 2007.
- R. Bassily, A. Smith, and A. Thakurta. “Private empirical risk minimization: Efficient algorithms and tight error bounds”. In: 55th Annual Symposium on Foundations of Computer Science. 2014, pp. 464–473.
- A. Beck and M. Teboulle. “Mirror Descent and Nonlinear Projected Subgradient Methods for Convex Optimization”. In: Operations Research Letters 31 (2003), pp. 167–175.
- K. Chaudhuri, C. Monteleoni, and A. D. Sarwate. “Differentially private empirical risk minimization”. In: Journal of Machine Learning Research 12 (2011), pp. 1069– 1109.
- J. C. Duchi, E. Hazan, and Y. Singer. “Adaptive subgradient methods for online learning and stochastic optimization”. In: Journal of Machine Learning Research 12 (2011), pp. 2121–2159.
- J. C. Duchi, M. I. Jordan, and M. J. Wainwright. “Local privacy and statistical minimax rates”. In: 54th Annual Symposium on Foundations of Computer Science. 2013, pp. 429–438.
- J. C. Duchi, M. I. Jordan, and M. J. Wainwright. “Minimax Optimal Procedures for Locally Private Estimation (with discussion)”. In: Journal of the American Statistical Association 113.521 (2018), pp. 182–215.
- C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor. “Our Data, Ourselves: Privacy Via Distributed Noise Generation”. In: Advances in Cryptology (EUROCRYPT 2006). 2006.
- C. Dwork, F. McSherry, K. Nissim, and A. Smith. “Calibrating noise to sensitivity in private data analysis”. In: Proceedings of the Third Theory of Cryptography Conference. 2006, pp. 265–284.
- C. Dwork and A. Roth. “The Algorithmic Foundations of Differential Privacy”. In: Foundations and Trends in Theoretical Computer Science 9.3 & 4 (2014), pp. 211–407.
- J. C. Duchi. “Introductory Lectures on Stochastic Convex Optimization”. In: The Mathematics of Data. IAS/Park City Mathematics Series. American Mathematical Society, 2018.
- J. C. Duchi. Information Theory and Statistics. Lecture Notes for Statistics 311/EE 377, Stanford University. Accessed May 2019. 2019. url: http://web.stanford.edu/class/stats311/lecture-notes.pdf.
- V. Feldman, T. Koren, and K. Talwar. “Private Stochastic Convex Optimization: Optimal Rates in Linear Time”. In: Proceedings of the Fifty-Second Annual ACM Symposium on the Theory of Computing. 2020.
- S. Hochreiter and J. Schmidhuber. “Long short-term memory”. In: Neural computation 9.8 (1997), pp. 1735–1780.
- J. Hiriart-Urruty and C. Lemaréchal. Convex Analysis and Minimization Algorithms I & II. New York: Springer, 1993.
- P. Kairouz, M. Ribero, K. Rush, and A. Thakurta. “Dimension independence in unconstrained private ERM via adaptive preconditioning”. In: arXiv:2008.06570 [cs.LG] (2020).
- D. Levy and J. C. Duchi. “Necessary and Sufficient Geometries for Gradient Methods”. In: Advances in Neural Information Processing Systems 32. 2019. url: https://arxiv.org/abs/1909.10455.
- B. McMahan and M. Streeter. “Adaptive Bound Optimization for Online Convex Optimization”. In: Proceedings of the Twenty Third Annual Conference on Computational Learning Theory. 2010.
- S. Merity, C. Xiong, J. Bradbury, and R. Socher. “Pointer Sentinel Mixture Models”. In: Proceedings of the Fifth International Conference on Learning Representations. 2017.
- A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. “Robust stochastic approximation approach to stochastic programming”. In: SIAM Journal on Optimization 19.4 (2009), pp. 1574–1609.
- V. Pichapati, A. T. Suresh, F. X. Yu, S. J. Reddi, and S. Kumar. AdaCliP: Adaptive clipping for private SGD. 2020.
- A. Smith and A. Thakurta. “Differentially Private Feature Selection via Stability Arguments, and the Robustness of the Lasso”. In: Proceedings of the Twenty Sixth Annual Conference on Computational Learning Theory. 2013, pp. 819–850. url: http://proceedings.mlr.press/v30/Guha13.html.
- A. Smith and A. Thakurta. “(Nearly) optimal algorithms for private online learning in full-information and bandit settings”. In: Advances in Neural Information Processing Systems 26. 2013.
- K. Talwar, A. Thakurta, and L. Zhang. “Nearly optimal private lasso”. In: Advances in Neural Information Processing Systems 28. 2015, pp. 3025–3033.
- R. Vershynin. High Dimensional Probability: An Introduction with Applications in Data Science. Cambridge University Press, 2019.
- D. Yu, H. Zhang, W. Chen, and T.-Y. Liu. “Do not Let Privacy Overbill Utility: Gradient Embedding Perturbation for Private Learning”. In: International Conference on Learning Representations. 2021. url: https://openreview.net/forum?id=7aogOj_VYO0.
- M. Zinkevich. “Online Convex Programming and Generalized Infinitesimal Gradient Ascent”. In: Proceedings of the Twentieth International Conference on Machine Learning. 2003.
- Y. Zhou, Z. S. Wu, and A. Banerjee. “Bypassing the ambient dimension: Private SGD with gradient subspace identification”. In: arXiv:2007.03813 [cs.LG] (2020).

Tags

Comments

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn