Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents
CoRR(2024)
摘要
Leveraging the rapid development of Large Language Models LLMs, LLM-based
agents have been developed to handle various real-world applications, including
finance, healthcare, and shopping, etc. It is crucial to ensure the reliability
and security of LLM-based agents during applications. However, the safety
issues of LLM-based agents are currently under-explored. In this work, we take
the first step to investigate one of the typical safety threats, backdoor
attack, to LLM-based agents. We first formulate a general framework of agent
backdoor attacks, then we present a thorough analysis on the different forms of
agent backdoor attacks. Specifically, from the perspective of the final
attacking outcomes, the attacker can either choose to manipulate the final
output distribution, or only introduce malicious behavior in the intermediate
reasoning process, while keeping the final output correct. Furthermore, the
former category can be divided into two subcategories based on trigger
locations: the backdoor trigger can be hidden either in the user query or in an
intermediate observation returned by the external environment. We propose the
corresponding data poisoning mechanisms to implement the above variations of
agent backdoor attacks on two typical agent tasks, web shopping and tool
utilization. Extensive experiments show that LLM-based agents suffer severely
from backdoor attacks, indicating an urgent need for further research on the
development of defenses against backdoor attacks on LLM-based agents. Warning:
This paper may contain biased content.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要