Apple of Sodom: Hidden Backdoors in Superior Sentence Embeddings via Contrastive Learning

Xiaoyi Chen,Baisong Xin,Shengfang Zhai,Shiqing Ma,Qingni Shen,Zhonghai Wu

arxiv（2022）

引用 1|浏览16

暂无评分

摘要

This paper finds that contrastive learning can produce superior sentence embeddings for pre-trained models but is also vulnerable to backdoor attacks. We present the first backdoor attack framework, BadCSE, for state-of-the-art sentence embeddings under supervised and unsupervised learning settings. The attack manipulates the construction of positive and negative pairs so that the backdoored samples have a similar embedding with the target sample (targeted attack) or the negative embedding of its clean version (non-targeted attack). By injecting the backdoor in sentence embeddings, BadCSE is resistant against downstream fine-tuning. We evaluate BadCSE on both STS tasks and other downstream tasks. The supervised non-targeted attack obtains a performance degradation of 194.86%, and the targeted attack maps the backdoored samples to the target embedding with a 97.70% success rate while maintaining the model utility.

查看译文

关键词

superior sentence embeddings,contrastive learning,hidden backdoors,sodom

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要