Detecting adversarial attacks through neural activations

semanticscholar(2021)

引用 0|浏览0
暂无评分
摘要
This paper presents a methodology to detect adversarial manipulations to image data by examining the activation patterns in a neural network. By comparing a sample’s layer-wise neural activations to clusters of its predicted class, we are able to detect irregularities between a model’s layer activations and its final predicted class. We evaluate our detection method using the FGSM attack method as well as the Carlini-Wagner L2 attack and misclassified images from the ImageNet dataset.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要