Gradient Similarity: An Explainable Approach to Detect Adversarial Attacks against Deep Learning.

arXiv: Computer Vision and Pattern Recognition(2018)

引用 25|浏览4
暂无评分
摘要
Deep neural networks are susceptible to small-but-specific adversarial perturbations capable of deceiving the network. This vulnerability can lead to potentially harmful consequences in security-critical applications. To address this vulnerability, we propose a novel metric called emph{Gradient Similarity} that allows us to capture the influence of training data on test inputs. We show that emph{Gradient Similarity} behaves differently for normal and adversarial inputs, and enables us to detect a variety of adversarial attacks with a near perfect ROC-AUC of 95-100%. Even white-box adversaries equipped with perfect knowledge of the system cannot bypass our detector easily. On the MNIST dataset, white-box attacks are either detected with a high ROC-AUC of 87-96%, or require very high distortion to bypass our detector.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要