A Generative Adversarial Attack for Multilingual Text Classifiers
CoRR(2024)
摘要
Current adversarial attack algorithms, where an adversary changes a text to
fool a victim model, have been repeatedly shown to be effective against text
classifiers. These attacks, however, generally assume that the victim model is
monolingual and cannot be used to target multilingual victim models, a
significant limitation given the increased use of these models. For this
reason, in this work we propose an approach to fine-tune a multilingual
paraphrase model with an adversarial objective so that it becomes able to
generate effective adversarial examples against multilingual classifiers. The
training objective incorporates a set of pre-trained models to ensure text
quality and language consistency of the generated text. In addition, all the
models are suitably connected to the generator by vocabulary-mapping matrices,
allowing for full end-to-end differentiability of the overall training
pipeline. The experimental validation over two multilingual datasets and five
languages has shown the effectiveness of the proposed approach compared to
existing baselines, particularly in terms of query efficiency. We also provide
a detailed analysis of the generated attacks and discuss limitations and
opportunities for future research.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要