A Curious Case of Remarkable Resilience to Gradient Attacks via Fully Convolutional and Differentiable Front End with a Skip Connection
CoRR(2024)
摘要
We tested front-end enhanced neural models where a frozen classifier was
prepended by a differentiable and fully convolutional model with a skip
connection. By training them using a small learning rate for about one epoch,
we obtained models that retained the accuracy of the backbone classifier while
being unusually resistant to gradient attacks including APGD and FAB-T attacks
from the AutoAttack package, which we attributed to gradient masking. The
gradient masking phenomenon is not new, but the degree of masking was quite
remarkable for fully differentiable models that did not have
gradient-shattering components such as JPEG compression or components that are
expected to cause diminishing gradients.
Though black box attacks can be partially effective against gradient masking,
they are easily defeated by combining models into randomized ensembles. We
estimate that such ensembles achieve near-SOTA AutoAttack accuracy on CIFAR10,
CIFAR100, and ImageNet despite having virtually zero accuracy under adaptive
attacks. Adversarial training of the backbone classifier can further increase
resistance of the front-end enhanced model to gradient attacks. On CIFAR10, the
respective randomized ensemble achieved 90.8± 2.5
AutoAttack while having only 18.2± 3.6
We do not establish SOTA in adversarial robustness. Instead, we make
methodological contributions and further supports the thesis that adaptive
attacks designed with the complete knowledge of model architecture are crucial
in demonstrating model robustness and that even the so-called white-box
gradient attacks can have limited applicability. Although gradient attacks can
be complemented with black-box attack such as the SQUARE attack or the
zero-order PGD, black-box attacks can be weak against randomized ensembles,
e.g., when ensemble models mask gradients.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要