The Devil's Advocate: Shattering the Illusion of Unexploitable Data using Diffusion Models
arxiv(2023)
摘要
Protecting personal data against exploitation of machine learning models is
crucial. Recently, availability attacks have shown great promise to provide an
extra layer of protection against the unauthorized use of data to train neural
networks. These methods aim to add imperceptible noise to clean data so that
the neural networks cannot extract meaningful patterns from the protected data,
claiming that they can make personal data "unexploitable." This paper provides
a strong countermeasure against such approaches, showing that unexploitable
data might only be an illusion. In particular, we leverage the power of
diffusion models and show that a carefully designed denoising process can
counteract the effectiveness of the data-protecting perturbations. We
rigorously analyze our algorithm, and theoretically prove that the amount of
required denoising is directly related to the magnitude of the data-protecting
perturbations. Our approach, called AVATAR, delivers state-of-the-art
performance against a suite of recent availability attacks in various
scenarios, outperforming adversarial training even under distribution mismatch
between the diffusion model and the protected data. Our findings call for more
research into making personal data unexploitable, showing that this goal is far
from over. Our implementation is available at this repository:
https://github.com/hmdolatabadi/AVATAR.
更多查看译文
关键词
unexploitable data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要