Learnable Boundary Guided Adversarial Training.

ICCV(2021)

引用 124|浏览94
暂无评分
摘要
Previous adversarial training raises model robustness under the compromise of accuracy on natural data. In this paper, our target is to reduce natural accuracy degradation. We use the model logits from one clean model $\mathcal{M}^{natural}$ to guide learning of the robust model $\mathcal{M}^{robust}$, taking into consideration that logits from the well trained clean model $\mathcal{M}^{natural}$ embed the most discriminative features of natural data, {\it e.g.}, generalizable classifier boundary. Our solution is to constrain logits from the robust model $\mathcal{M}^{robust}$ that takes adversarial examples as input and make it similar to those from a clean model $\mathcal{M}^{natural}$ fed with corresponding natural data. It lets $\mathcal{M}^{robust}$ inherit the classifier boundary of $\mathcal{M}^{natural}$. Thus, we name our method Boundary Guided Adversarial Training (BGAT). Moreover, we generalize BGAT to Learnable Boundary Guided Adversarial Training (LBGAT) by training $\mathcal{M}^{natural}$ and $\mathcal{M}^{robust}$ simultaneously and collaboratively to learn one most robustness-friendly classifier boundary for the strongest robustness. Extensive experiments are conducted on CIFAR-10, CIFAR-100, and challenging Tiny ImageNet datasets. Along with other state-of-the-art adversarial training approaches, {\it e.g.}, Adversarial Logit Pairing (ALP) and TRADES, the performance is further enhanced.
更多
查看译文
关键词
training
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要