FeConDefense: Reversing adversarial attacks via feature consistency loss

Weijia Wang,Chao Zhou,Da Lin,Yuan-Gen Wang

Computer Communications（2023）

引用 0|浏览6

暂无评分

摘要

Existing adversarial defense methods often employ adversarial training or data pre-processing techniques to defend against adversarial attacks. However, adversarial training is burdensome as it requires to find a single representation that works for all attack possibilities, and may decrease the network model’s classification accuracy after excessive training. While data pre-processing methods focus on eliminating adversarial perturbations by modifying the input samples, they do not consider the internal relationships between reverse perturbations and adversarial examples, resulting in weak specificity of the generated modifications. In this paper, we propose a novel adversarial defense method, named FeConDefense, which aims to reverse adversarial attacks via analyzing the intrinsic features of images. Specifically, we first extract two different features of adversarial examples by respectively using two different network models. Then, we design a novel feature consistency loss to measure the distance between these two features. Finally, we integrate the feature consistency into the contrastive learning to generate reverse perturbation for each adversarial example. Comprehensive experiments on different adversarial attack methods demonstrate that our FeConDefense achieves state-of-the-art results in reversing adversarial perturbations and improving robustness of image classifiers.

查看译文

关键词

Adversarial defense,Contrastive learning,Feature consistency loss

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要