DART: A Principled Approach to Adversarially Robust Unsupervised Domain Adaptation
CoRR(2024)
摘要
Distribution shifts and adversarial examples are two major challenges for
deploying machine learning models. While these challenges have been studied
individually, their combination is an important topic that remains relatively
under-explored. In this work, we study the problem of adversarial robustness
under a common setting of distribution shift - unsupervised domain adaptation
(UDA). Specifically, given a labeled source domain D_S and an unlabeled
target domain D_T with related but different distributions, the goal is to
obtain an adversarially robust model for D_T. The absence of target domain
labels poses a unique challenge, as conventional adversarial robustness
defenses cannot be directly applied to D_T. To address this challenge, we
first establish a generalization bound for the adversarial target loss, which
consists of (i) terms related to the loss on the data, and (ii) a measure of
worst-case domain divergence. Motivated by this bound, we develop a novel
unified defense framework called Divergence Aware adveRsarial Training (DART),
which can be used in conjunction with a variety of standard UDA methods; e.g.,
DANN [Ganin and Lempitsky, 2015]. DART is applicable to general threat models,
including the popular ℓ_p-norm model, and does not require heuristic
regularizers or architectural changes. We also release DomainRobust: a testbed
for evaluating robustness of UDA models to adversarial attacks. DomainRobust
consists of 4 multi-domain benchmark datasets (with 46 source-target pairs) and
7 meta-algorithms with a total of 11 variants. Our large-scale experiments
demonstrate that on average, DART significantly enhances model robustness on
all benchmarks compared to the state of the art, while maintaining competitive
standard accuracy. The relative improvement in robustness from DART reaches up
to 29.2
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要