DropAttack: A Random Dropped Weight Attack Adversarial Training for Natural Language Understanding

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING(2024)

引用 0|浏览9
暂无评分
摘要
Adversarial training has been proven to be a powerful regularization technique to improve language models. In this work, we propose a novel random dropped weight attack adversarial training method (DropAttack) for natural language understanding. Our DropAttack improves the generalization of models by minimizing the internal adversarial risk caused by a multitude of attack combinations. Specifically, DropAttack enhances the adversarial attack space by intentionally adding worst-case adversarial perturbations to the weight parameters and randomly dropping the specific proportion of attack perturbations. To extensively validate the effectiveness of DropAttack, 12 public English natural language understanding datasets were used. Experiments on the GLUE benchmark show that when DropAttack is applied only to the finetuning stage, it is able to improve the overall test scores of the BERT-base pre-trained model from 78.3 to 79.7 and RoBERTa-large pre-trained model from 88.1 to 88.8. Further, DropAttack also significantly improves models trained from scratch. Theoretical analysis reveals that DropAttack performs potential gradient regularization on the input and weight parameters of the model. Moreover, visualization experiments show that DropAttack can push the minimum risk of the neural network to a lower and flatter loss landscape.
更多
查看译文
关键词
Perturbation methods,Training,Neural networks,Speech processing,Backpropagation,Standards,Robustness,Adversarial training,natural language understanding,regularization,generalization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要