Detecting adversarial attacks through neural activations
semanticscholar(2021)
摘要
This paper presents a methodology to detect adversarial manipulations to image data by examining the activation patterns in a neural network. By comparing a sample’s layer-wise neural activations to clusters of its predicted class, we are able to detect irregularities between a model’s layer activations and its final predicted class. We evaluate our detection method using the FGSM attack method as well as the Carlini-Wagner L2 attack and misclassified images from the ImageNet dataset.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要