On the Brittleness of Robust Features: An Exploratory Analysis of Model Robustness and Illusionary Robust Features

SP Workshops(2023)

引用 0|浏览8
暂无评分
摘要
Neural networks have been shown to be vulnerable to visual data perturbations imperceptible to the human eye. Nowadays, the leading hypothesis about the reason for the existence of these adversarial examples is the presence of nonrobust features, which are highly predictive but brittle. Also, it has been shown that there exist two types of non-robust features depending on whether or not they are entangled with robust features; perturbing non-robust features entangled with robust features can form adversarial examples. This paper extends earlier work by showing that models trained exclusively on robust features are still vulnerable to one type of adversarial example. Standard-trained networks can classify more accurately than robustly trained networks in this situation. Our experiments show that this phenomenon is due to the high correlation between most of the robust features and both correct and incorrect labels. In this work, we define features highly correlated with correct and incorrect labels as illusionary robust features. We discuss how perturbing an image attacking robust models affects the feature space. Based on our observations on the feature space, we explain why standard models are more successful in correctly classifying these perturbed images than robustly trained models. Our observations also show that, similar to changing non-robust features, changing some of the robust features is still imperceptible to the human eye.
更多
查看译文
关键词
Adversarial attacks,robust/non-robust features,convolutional neural networks,deep learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要