AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We study adaptive methods for differentially private convex optimization, proposing and analyzing differentially private variants of a Stochastic Gradient Descent algorithm with adaptive stepsizes, as well as the AdaGrad algorithm

Private Adaptive Gradient Methods For Convex Optimization

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, (2021): 383-392

Cited: 7|Views26
EI
Full Text
Bibtex
Weibo

Abstract

We study adaptive methods for differentially private convex optimization, proposing and analyzing differentially private variants of a Stochastic Gradient Descent (SGD) algorithm with adaptive stepsizes, as well as the AdaGrad algorithm. We provide upper bounds on the regret of both algorithms and show that the bounds are (worst-case) opt...More

Code:

Data:

0
Highlights
  • While the success of stochastic gradient methods for solving empirical risk minimization has motivated their adoption across much of machine learning, increasing privacy risks in data-intensive tasks have made applying them more challenging [DMNS06]: gradients can leak users’ data, intermediate models can compromise individuals, and even final trained models may be non-private without substantial care
  • We introduce Private Adagrad with Adaptive Noise (Pagan) (Private AdaGrad with Adaptive Noise), a new differentially private variant of stochastic gradient descent and AdaGrad
  • By carefully adapting noise to the actual gradients at hand, we can both achieve convergence rates that reflect the observed magnitude of the gradients—similar to the approach of Bartlett et al [BHR07] in the non-private case—which can yield marked improvements over the typical guarantees that depend on worst-case magnitudes. (Think, for example, of a standard normal variable: its second moment is 1, while its maximum value is unbounded.) we propose a new private adaptive optimization algorithm that analogizes AdaGrad, showing that under certain natural distributional assumptions for the problems—similar to those that separate AdaGrad from non-adaptive methods [LD19]—our private versions of adaptive methods significantly outperform the standard non-adaptive private algorithms
  • Since the introduction of differential privacy [DMNS06; DKMMN06], differentially private empirical risk minimization has been a subject of intense interest [CMS11; BST14; DJW13; ST13a]
  • Zhou et al [ZWB20] propose a private Stochastic Gradient Descent (SGD) algorithm where the gradients are projected to a lowdimensional subspace—which is learned using public data—and Kairouz et al [KRRT20] developed an ε-differentially private variant of Adagrad which projects the gradient to a low rank subspace
Tables
  • Table1: Test perplexity error of different methods. For reference, non-private SGD and AdaGrad (without clipping) achieve 75.45 and 79.74, respectively
Download tables as Excel
Related work
  • Since the introduction of differential privacy [DMNS06; DKMMN06], differentially private empirical risk minimization has been a subject of intense interest [CMS11; BST14; DJW13; ST13a]. The current standard approach to solving this problem is noisy SGD [BST14; DJW13; ACGMMTZ16; BFTT19; FKT20]. Current bounds focus on the standard Euclidean geometries familiar from classical analyses of gradient descent [Zin03; NJLS09], and the prototypical result [BST14; BFTT19]

    is that, for Lipschitz convex optimization problems on th√e 2-ball in d-dimensions, an ε-differentially private version of

    SGD achieves excess empirical loss O( d nε )

    given a sample of size n; this is minimax optimal. Similar bounds also hold for other geometries ( p-balls for 1 ≤ p ≤ 2) using noisy mirror descent [AFKT21]. Alternative approaches use the stability of empirical risk minimizers of (strongly) convex functions, and include both output perturbation, where one adds noise to a regularized empirical minimizer, and objective perturbation, where one incorporates random linear noise in the objective function before optimization [CMS11].

    Given the success of private SGD for such Euclidean cases and adaptive gradient algorithms for modern large-scale learning, it is unsurprising that recent work attempts to incorporate adaptivity into private empirical risk minimization (ERM) algorithms [ZWB20; KRRT20]. In this vein, Zhou et al [ZWB20] propose a private SGD algorithm where the gradients are projected to a lowdimensional subspace—which is learned using public data—and Kairouz et al [KRRT20] developed an ε-differentially private variant of Adagrad which (simil√arly) projects the gradient to a low rank subspace.
Funding
  • By carefully adapting noise to the actual gradients at hand, we can both achieve convergence rates that reflect the observed magnitude of the gradients—similar to the approach of Bartlett et al [BHR07] in the non-private case—which can yield marked improvements over the typical guarantees that depend on worst-case magnitudes. (Think, for example, of a standard normal variable: its second moment is 1, while its maximum value is unbounded.) Moreover, we propose a new private adaptive optimization algorithm that analogizes AdaGrad, showing that under certain natural distributional assumptions for the problems—similar to those that separate AdaGrad from non-adaptive methods [LD19]—our private versions of adaptive methods significantly outperform the standard non-adaptive private algorithms
Reference
  • [ACGMMTZ16] M. Abadi, A. Chu, I. Goodfellow, B. McMahan, I. Mironov, K. Talwar, and L. Zhang. “Deep Learning with Differential Privacy”. In: 23rd ACM Conference on Computer and Communications Security (ACM CCS). 2016, pp. 308–318.
    Google ScholarLocate open access versionFindings
  • H. Asi and J. C. Duchi. “Stochastic (Approximate) Proximal Point Methods: Convergence, Optimality, and Adaptivity”. In: SIAM Journal on Optimization 29.3 (2019), pp. 2257–2290. url: https://arXiv.org/abs/1810.05633.
    Findings
  • H. Asi, V. Feldman, T. Koren, and K. Talwar. “Private Stochastic Convex Optimization: Optimal Rates in 1 Geometry”. In: arXiv:2103.01516 [cs.LG] (2021).
    Findings
  • R. F. Barber and J. C. Duchi. “Privacy and Statistical Risk: Formalisms and Minimax Bounds”. In: arXiv:1412.4451 [math.ST] (2014).
    Findings
  • R. Bassily, V. Feldman, K. Talwar, and A. Thakurta. “Private Stochastic Convex Optimization with Optimal Rates”. In: Advances in Neural Information Processing Systems 32. 2019.
    Google ScholarLocate open access versionFindings
  • P. L. Bartlett, E. Hazan, and A. Rakhlin. “Adaptive Online Gradient Descent”. In: Advances in Neural Information Processing Systems 20. 2007.
    Google ScholarLocate open access versionFindings
  • R. Bassily, A. Smith, and A. Thakurta. “Private empirical risk minimization: Efficient algorithms and tight error bounds”. In: 55th Annual Symposium on Foundations of Computer Science. 2014, pp. 464–473.
    Google ScholarLocate open access versionFindings
  • A. Beck and M. Teboulle. “Mirror Descent and Nonlinear Projected Subgradient Methods for Convex Optimization”. In: Operations Research Letters 31 (2003), pp. 167–175.
    Google ScholarLocate open access versionFindings
  • K. Chaudhuri, C. Monteleoni, and A. D. Sarwate. “Differentially private empirical risk minimization”. In: Journal of Machine Learning Research 12 (2011), pp. 1069– 1109.
    Google ScholarLocate open access versionFindings
  • J. C. Duchi, E. Hazan, and Y. Singer. “Adaptive subgradient methods for online learning and stochastic optimization”. In: Journal of Machine Learning Research 12 (2011), pp. 2121–2159.
    Google ScholarLocate open access versionFindings
  • J. C. Duchi, M. I. Jordan, and M. J. Wainwright. “Local privacy and statistical minimax rates”. In: 54th Annual Symposium on Foundations of Computer Science. 2013, pp. 429–438.
    Google ScholarLocate open access versionFindings
  • J. C. Duchi, M. I. Jordan, and M. J. Wainwright. “Minimax Optimal Procedures for Locally Private Estimation (with discussion)”. In: Journal of the American Statistical Association 113.521 (2018), pp. 182–215.
    Google ScholarLocate open access versionFindings
  • C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor. “Our Data, Ourselves: Privacy Via Distributed Noise Generation”. In: Advances in Cryptology (EUROCRYPT 2006). 2006.
    Google ScholarLocate open access versionFindings
  • C. Dwork, F. McSherry, K. Nissim, and A. Smith. “Calibrating noise to sensitivity in private data analysis”. In: Proceedings of the Third Theory of Cryptography Conference. 2006, pp. 265–284.
    Google ScholarLocate open access versionFindings
  • C. Dwork and A. Roth. “The Algorithmic Foundations of Differential Privacy”. In: Foundations and Trends in Theoretical Computer Science 9.3 & 4 (2014), pp. 211–407.
    Google ScholarLocate open access versionFindings
  • J. C. Duchi. “Introductory Lectures on Stochastic Convex Optimization”. In: The Mathematics of Data. IAS/Park City Mathematics Series. American Mathematical Society, 2018.
    Google ScholarLocate open access versionFindings
  • J. C. Duchi. Information Theory and Statistics. Lecture Notes for Statistics 311/EE 377, Stanford University. Accessed May 2019. 2019. url: http://web.stanford.edu/class/stats311/lecture-notes.pdf.
    Findings
  • V. Feldman, T. Koren, and K. Talwar. “Private Stochastic Convex Optimization: Optimal Rates in Linear Time”. In: Proceedings of the Fifty-Second Annual ACM Symposium on the Theory of Computing. 2020.
    Google ScholarLocate open access versionFindings
  • S. Hochreiter and J. Schmidhuber. “Long short-term memory”. In: Neural computation 9.8 (1997), pp. 1735–1780.
    Google ScholarLocate open access versionFindings
  • J. Hiriart-Urruty and C. Lemaréchal. Convex Analysis and Minimization Algorithms I & II. New York: Springer, 1993.
    Google ScholarFindings
  • P. Kairouz, M. Ribero, K. Rush, and A. Thakurta. “Dimension independence in unconstrained private ERM via adaptive preconditioning”. In: arXiv:2008.06570 [cs.LG] (2020).
    Findings
  • D. Levy and J. C. Duchi. “Necessary and Sufficient Geometries for Gradient Methods”. In: Advances in Neural Information Processing Systems 32. 2019. url: https://arxiv.org/abs/1909.10455.
    Findings
  • B. McMahan and M. Streeter. “Adaptive Bound Optimization for Online Convex Optimization”. In: Proceedings of the Twenty Third Annual Conference on Computational Learning Theory. 2010.
    Google ScholarLocate open access versionFindings
  • S. Merity, C. Xiong, J. Bradbury, and R. Socher. “Pointer Sentinel Mixture Models”. In: Proceedings of the Fifth International Conference on Learning Representations. 2017.
    Google ScholarLocate open access versionFindings
  • A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. “Robust stochastic approximation approach to stochastic programming”. In: SIAM Journal on Optimization 19.4 (2009), pp. 1574–1609.
    Google ScholarLocate open access versionFindings
  • V. Pichapati, A. T. Suresh, F. X. Yu, S. J. Reddi, and S. Kumar. AdaCliP: Adaptive clipping for private SGD. 2020.
    Google ScholarFindings
  • A. Smith and A. Thakurta. “Differentially Private Feature Selection via Stability Arguments, and the Robustness of the Lasso”. In: Proceedings of the Twenty Sixth Annual Conference on Computational Learning Theory. 2013, pp. 819–850. url: http://proceedings.mlr.press/v30/Guha13.html.
    Locate open access versionFindings
  • A. Smith and A. Thakurta. “(Nearly) optimal algorithms for private online learning in full-information and bandit settings”. In: Advances in Neural Information Processing Systems 26. 2013.
    Google ScholarLocate open access versionFindings
  • K. Talwar, A. Thakurta, and L. Zhang. “Nearly optimal private lasso”. In: Advances in Neural Information Processing Systems 28. 2015, pp. 3025–3033.
    Google ScholarLocate open access versionFindings
  • R. Vershynin. High Dimensional Probability: An Introduction with Applications in Data Science. Cambridge University Press, 2019.
    Google ScholarFindings
  • D. Yu, H. Zhang, W. Chen, and T.-Y. Liu. “Do not Let Privacy Overbill Utility: Gradient Embedding Perturbation for Private Learning”. In: International Conference on Learning Representations. 2021. url: https://openreview.net/forum?id=7aogOj_VYO0.
    Locate open access versionFindings
  • M. Zinkevich. “Online Convex Programming and Generalized Infinitesimal Gradient Ascent”. In: Proceedings of the Twentieth International Conference on Machine Learning. 2003.
    Google ScholarLocate open access versionFindings
  • Y. Zhou, Z. S. Wu, and A. Banerjee. “Bypassing the ambient dimension: Private SGD with gradient subspace identification”. In: arXiv:2007.03813 [cs.LG] (2020).
    Findings
0
Your rating :

No Ratings

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn