A Theory to Instruct Differentially-Private Learning via Clipping Bias Reduction.

SP(2023)

引用 4|浏览33
暂无评分
摘要
We study the bias introduced in Differentially-Private Stochastic Gradient Descent (DP-SGD) with clipped or normalized per-sample gradient. As one of the most popular but artificial operations to ensure bounded sensitivity, gradient clipping enables composite privacy analysis of many iterative optimization methods without additional assumptions on either learning models or input data. Despite its wide applicability, gradient clipping also presents theoretical challenges in systematically instructing improvement of privacy or utility. In general, without an assumption on globally-bounded gradient, classic convergence analyses do not apply to clipped gradient descent. Further, given limited understanding of the utility loss, many existing improvements to DP-SGD are heuristic, especially in the applications of private deep learning. In this paper, we provide meaningful theoretical analysis validated by thorough empirical results of DP-SGD. We point out that the bias caused by gradient clipping is underestimated in previous works. For generic non-convex optimization via DP-SGD, we show one key factor contributing to the bias is the sampling noise of stochastic gradient to be clipped. Accordingly, we use the developed theory to build a series of improvements for sampling noise reduction from various perspectives. From an optimization angle, we study variance reduction techniques and propose inner-outer momentum. At the learning model (neural network) level, we propose several tricks to enhance network internal normalization and BatchClipping to carefully clip the gradient of a batch of samples. For data preprocessing, we provide theoretical justification of recently proposed improvements via data normalization and (self-)augmentation. Putting these systematic improvements together, private deep learning via DP-SGD can be significantly strengthened in many tasks. For example, in computer vision applications, with an (epsilon = 8, delta = 10(-5)) DP guarantee, we successfully train ResNet20 on CIFAR10 and SVHN with test accuracy 76.0% and 90.1%, respectively; for natural language processing, with (epsilon = 4, delta = 10(-5)), we successfully train a recurrent neural network on IMDb data with test accuracy 77.5%.
更多
查看译文
关键词
Differential-Privacy,DP-SGD,BatchClipping,Non-convex-Optimization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要