Identifying and Mitigating Vulnerabilities in LLM-Integrated Applications
CoRR(2023)
摘要
Large language models (LLMs) are increasingly deployed as the service backend
for LLM-integrated applications such as code completion and AI-powered search.
LLM-integrated applications serve as middleware to refine users' queries with
domain-specific knowledge to better inform LLMs and enhance the responses.
Despite numerous opportunities and benefits, LLM-integrated applications also
introduce new attack surfaces. Understanding, minimizing, and eliminating these
emerging attack surfaces is a new area of research. In this work, we consider a
setup where the user and LLM interact via an LLM-integrated application in the
middle. We focus on the communication rounds that begin with user's queries and
end with LLM-integrated application returning responses to the queries, powered
by LLMs at the service backend. For this query-response protocol, we identify
potential vulnerabilities that can originate from the malicious application
developer or from an outsider threat initiator that is able to control the
database access, manipulate and poison data that are high-risk for the user.
Successful exploits of the identified vulnerabilities result in the users
receiving responses tailored to the intent of a threat initiator. We assess
such threats against LLM-integrated applications empowered by OpenAI GPT-3.5
and GPT-4. Our empirical results show that the threats can effectively bypass
the restrictions and moderation policies of OpenAI, resulting in users
receiving responses that contain bias, toxic content, privacy risk, and
disinformation. To mitigate those threats, we identify and define four key
properties, namely integrity, source identification, attack detectability, and
utility preservation, that need to be satisfied by a safe LLM-integrated
application. Based on these properties, we develop a lightweight,
threat-agnostic defense that mitigates both insider and outsider threats.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要