Counterfactual Explanations for Face Forgery Detection via Adversarial Removal of Artifacts
arxiv(2024)
摘要
Highly realistic AI generated face forgeries known as deepfakes have raised
serious social concerns. Although DNN-based face forgery detection models have
achieved good performance, they are vulnerable to latest generative methods
that have less forgery traces and adversarial attacks. This limitation of
generalization and robustness hinders the credibility of detection results and
requires more explanations. In this work, we provide counterfactual
explanations for face forgery detection from an artifact removal perspective.
Specifically, we first invert the forgery images into the StyleGAN latent
space, and then adversarially optimize their latent representations with the
discrimination supervision from the target detection model. We verify the
effectiveness of the proposed explanations from two aspects: (1) Counterfactual
Trace Visualization: the enhanced forgery images are useful to reveal artifacts
by visually contrasting the original images and two different visualization
methods; (2) Transferable Adversarial Attacks: the adversarial forgery images
generated by attacking the detection model are able to mislead other detection
models, implying the removed artifacts are general. Extensive experiments
demonstrate that our method achieves over 90
attack transferability. Compared with naive adversarial noise methods, our
method adopts both generative and discriminative model priors, and optimize the
latent representations in a synthesis-by-analysis way, which forces the search
of counterfactual explanations on the natural face manifold. Thus, more general
counterfactual traces can be found and better adversarial attack
transferability can be achieved.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要