Countering the Attack-Defense Complexity Gap for Robust Classifiers

ICLR 2023(2023)

引用 0|浏览7
暂无评分
摘要
We consider the decision version of defending and attacking Machine Learning classifiers. We provide a rationale for the well-known difficulties in building robust models: in particular we prove that, under broad assumptions, attacking a polynomial-time classifier is $NP$-complete, while training a polynomial-time model that is robust on even a single input is $\Sigma_2^P$-complete. We also provide more general bounds for non-polynomial classifiers. We then show how such a complexity gap can be sidestepped by introducing Counter-Attack (CA), a system that computes on-the-fly robustness certificates for a given input up to an arbitrary distance bound $\varepsilon$. We also prove that, even when attacked with perturbations of magnitude $\varepsilon^\prime > \varepsilon$, CA still provides computational robustness: specifically, while computing a certificate is $NP$-complete, attacking the system beyond its intended robustness is $\Sigma_2^P$-complete. Since the exact form of CA can still be computationally expensive, we introduce a relaxation of this method, which we empirically show to be reliable at identifying non-robust inputs. As part of our work, we introduce UG100, a new dataset obtained by applying a provably optimal attack to six limited-scale networks (three for MNIST and three for CIFAR10), each trained in three different manners.
更多
查看译文
关键词
adversarial attacks,adversarial robustness,computational complexity,dataset
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要