Towards semantics- and domain-aware adversarial attacks

Jianping Zhang, Yung-Chieh Huang,Weibin Wu,Michael R. Lyu

IJCAI 2023（2023）

引用 0|浏览22

暂无评分

摘要

Language models are known to be vulnerable to textual adversarial attacks, which add human-imperceptible perturbations to the input to mislead DNNs. It is thus imperative to devise effective attack algorithms to identify the deficiencies of DNNs before real-world deployment. However, existing word-level attacks have two major deficiencies: (1) They may change the semantics of the original sentence. (2) The generated adversarial sample can appear unnatural to humans due to the introduction of out-of-domain substitute words. In this paper, to address such drawbacks, we propose a semantics- and domain-aware word-level attack method. Specifically, we greedily replace the important words in a sentence with the ones suggested by a language model. The language model is trained to be semantics- and domain-aware via contrastive learning and in-domain pre-training. Furthermore, to balance the quality of adversarial examples and the attack success rate, we propose an iterative updating framework to optimize the contrastive learning loss and the in-domain pretraining loss in circular order. Notably, compared with state-of-the-art benchmarks, our strategy can achieve over 3% improvement in attack success rates and 9.8% improvement in the quality of adversarial examples.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要