Probing Unlearned Diffusion Models: A Transferable Adversarial Attack Perspective
arxiv(2024)
摘要
Advanced text-to-image diffusion models raise safety concerns regarding
identity privacy violation, copyright infringement, and Not Safe For Work
content generation. Towards this, unlearning methods have been developed to
erase these involved concepts from diffusion models. However, these unlearning
methods only shift the text-to-image mapping and preserve the visual content
within the generative space of diffusion models, leaving a fatal flaw for
restoring these erased concepts. This erasure trustworthiness problem needs
probe, but previous methods are sub-optimal from two perspectives: (1) Lack of
transferability: Some methods operate within a white-box setting, requiring
access to the unlearned model. And the learned adversarial input often fails to
transfer to other unlearned models for concept restoration; (2) Limited attack:
The prompt-level methods struggle to restore narrow concepts from unlearned
models, such as celebrity identity. Therefore, this paper aims to leverage the
transferability of the adversarial attack to probe the unlearning robustness
under a black-box setting. This challenging scenario assumes that the
unlearning method is unknown and the unlearned model is inaccessible for
optimization, requiring the attack to be capable of transferring across
different unlearned models. Specifically, we employ an adversarial search
strategy to search for the adversarial embedding which can transfer across
different unlearned models. This strategy adopts the original Stable Diffusion
model as a surrogate model to iteratively erase and search for embeddings,
enabling it to find the embedding that can restore the target concept for
different unlearning methods. Extensive experiments demonstrate the
transferability of the searched adversarial embedding across several
state-of-the-art unlearning methods and its effectiveness for different levels
of concepts.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要