On the Existence of a Trojaned Twin Model
ICLR 2023(2023)
摘要
We study the Trojan Attack problem, where malicious attackers sabotage deep
neural network models with poisoned training data. In most existing works, the
effectiveness of the attack is largely overlooked; many attacks can be ineffective
or inefficient for certain training schemes, e.g., adversarial training. In this paper,
we adopt a novel perspective and look into the quantitative relationship between a
clean model and its Trojaned counterpart. We formulate a successful attack using
classic machine learning language. Under mild assumptions, we show theoretically
that there exists a Trojaned model, named Trojaned Twin, that is very close to the
clean model. This attack can be achieved by simply using a universal Trojan trigger
intrinsic to the data distribution. This has powerful implications in practice; the
Trojaned twin model has enhanced attack efficacy and strong resiliency against
detection. Empirically, we show that our method achieves consistent attack efficacy
across different training schemes, including the challenging adversarial training
scheme. Furthermore, this Trojaned twin model is robust against SoTA
detection methods
更多查看译文
关键词
Backdoor Attack,Trojan Attack
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要