Syntactic Ghost: An Imperceptible General-purpose Backdoor Attacks on Pre-trained Language Models
CoRR(2024)
摘要
Pre-trained language models (PLMs) have been found susceptible to backdoor
attacks, which can transfer vulnerabilities to various downstream tasks.
However, existing PLM backdoors are conducted with explicit triggers under the
manually aligned, thus failing to satisfy expectation goals simultaneously in
terms of effectiveness, stealthiness, and universality. In this paper, we
propose a novel approach to achieve invisible and general backdoor
implantation, called Syntactic Ghost (synGhost for short).
Specifically, the method hostilely manipulates poisoned samples with different
predefined syntactic structures as stealth triggers and then implants the
backdoor to pre-trained representation space without disturbing the primitive
knowledge. The output representations of poisoned samples are distributed as
uniformly as possible in the feature space via contrastive learning, forming a
wide range of backdoors. Additionally, in light of the unique properties of
syntactic triggers, we introduce an auxiliary module to drive the PLMs to learn
this knowledge in priority, which can alleviate the interference between
different syntactic structures. Experiments show that our method outperforms
the previous methods and achieves the predefined objectives. Not only do severe
threats to various natural language understanding (NLU) tasks on two tuning
paradigms but also to multiple PLMs. Meanwhile, the synGhost is imperceptible
against three countermeasures based on perplexity, fine-pruning, and the
proposed maxEntropy.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要