Differentially Private Representation Learning via Image Captioning
arxiv(2024)
摘要
Differentially private (DP) machine learning is considered the gold-standard
solution for training a model from sensitive data while still preserving
privacy. However, a major barrier to achieving this ideal is its sub-optimal
privacy-accuracy trade-off, which is particularly visible in DP representation
learning. Specifically, it has been shown that under modest privacy budgets,
most models learn representations that are not significantly better than
hand-crafted features. In this work, we show that effective DP representation
learning can be done via image captioning and scaling up to internet-scale
multimodal datasets. Through a series of engineering tricks, we successfully
train a DP image captioner (DP-Cap) on a 233M subset of LAION-2B from scratch
using a reasonable amount of computation, and obtaining unprecedented
high-quality image features that can be used in a variety of downstream vision
and vision-language tasks. For example, under a privacy budget of
ε=8, a linear classifier trained on top of learned DP-Cap features
attains 65.8
of 56.5
representation learning cannot be achieved by training from scratch.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要