Backdoor Attacks via Machine Unlearning

AAAI 2024(2024)

引用 0|浏览1
暂无评分
摘要
As a new paradigm to erase data from a model and protect user privacy, machine unlearning has drawn significant attention. However, existing studies on machine unlearning mainly focus on its effectiveness and efficiency, neglecting the security challenges introduced by this technique. In this paper, we aim to bridge this gap and study the possibility of conducting malicious attacks leveraging machine unlearning. Specifically, we consider the backdoor attack via machine unlearning, where an attacker seeks to inject a backdoor in the unlearned model by submitting malicious unlearning requests, so that the prediction made by the unlearned model can be changed when a particular trigger presents. In our study, we propose two attack approaches. The first attack approach does not require the attacker to poison any training data of the model. The attacker can achieve the attack goal only by requesting to unlearn a small subset of his contributed training data. The second approach allows the attacker to poison a few training instances with a pre-defined trigger upfront, and then activate the attack via submitting a malicious unlearning request. Both attack approaches are proposed with the goal of maximizing the attack utility while ensuring attack stealthiness. The effectiveness of the proposed attacks is demonstrated with different machine unlearning algorithms as well as different models on different datasets.
更多
查看译文
关键词
ML: Adversarial Learning & Robustness,ML: Classification and Regression,PEAI: Safety, Robustness & Trustworthiness
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要