Backdoor Attack Detecting and Removing Based on Knowledge Distillation for Natural Language Translation Model

Mao Chen,Lihui Pang,Qingyi Tan, Yilong Tang, Yulang Liu, Wenwei Zhang

2023 9th International Conference on Computer and Communications (ICCC)(2023)

引用 0|浏览0
暂无评分
摘要
The lack of interpretability in Deep Neural Networks makes them susceptible to backdoor attacks. The attacker mixes the poisoned data with triggers into a clean dataset, and uses it to train a backdoor model. This model maintains high accuracy on clean data while outputting the attacker’s desired target for the poisoned data. Due to the serious threat of backdoor attacks on DNN, backdoor defense on DNN is particularly important. In our work, we apply the knowledge distillation method in the visual field to natural language processing. The method of knowledge distillation involves removing toxic data from the poisoned training dataset and restoring the accuracy of the distilled model. In a defensive scenario, one assumption is that the defender can collect clean data without labels. We evaluated the effectiveness of knowledge distillation on strategies through two application scenarios in natural language processing and multiple models. By distilling and fine-tuning to disable backdoors, we further improved the classification accuracy of the distilled models. The experimental results indicate that the method of knowledge distillation can also effectively defend against backdoor attacks in natural language processing.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要