Bounding the Excess Risk for Linear Models Trained on Marginal-Preserving, Differentially-Private, Synthetic Data
arxiv(2024)
摘要
The growing use of machine learning (ML) has raised concerns that an ML model
may reveal private information about an individual who has contributed to the
training dataset. To prevent leakage of sensitive data, we consider using
differentially-private (DP), synthetic training data instead of real training
data to train an ML model. A key desirable property of synthetic data is its
ability to preserve the low-order marginals of the original distribution. Our
main contribution comprises novel upper and lower bounds on the excess
empirical risk of linear models trained on such synthetic data, for continuous
and Lipschitz loss functions. We perform extensive experimentation alongside
our theoretical results.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要