Abusing Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs

Eugene Bagdasaryan, Tsung-Yin Hsieh, Ben Nassi,Vitaly Shmatikov

arXiv (Cornell University)(2023)

引用 0|浏览9
暂无评分
摘要
We demonstrate how images and sounds can be used for indirect prompt and instruction injection in multi-modal LLMs. An attacker generates an adversarial perturbation corresponding to the prompt and blends it into an image or audio recording. When the user asks the (unmodified, benign) model about the perturbed image or audio, the perturbation steers the model to output the attacker-chosen text and/or make the subsequent dialog follow the attacker's instruction. We illustrate this attack with several proof-of-concept examples targeting LLaVa and PandaGPT.
更多
查看译文
关键词
indirect instruction injection,abusing images,multi-modal
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要