Detecting adversarial attacks through neural activations

Graham Annett,Tim Andersen,Casey Kennington,Hoda Mehrpouyan

semanticscholar（2021）

引用 0|浏览0

暂无评分

摘要

This paper presents a methodology to detect adversarial manipulations to image data by examining the activation patterns in a neural network. By comparing a sample’s layer-wise neural activations to clusters of its predicted class, we are able to detect irregularities between a model’s layer activations and its final predicted class. We evaluate our detection method using the FGSM attack method as well as the Carlini-Wagner L2 attack and misclassified images from the ImageNet dataset.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要