Prompt Stealing Attacks Against Large Language Models
CoRR(2024)
Abstract
The increasing reliance on large language models (LLMs) such as ChatGPT in
various fields emphasizes the importance of “prompt engineering,” a
technology to improve the quality of model outputs. With companies investing
significantly in expert prompt engineers and educational resources rising to
meet market demand, designing high-quality prompts has become an intriguing
challenge. In this paper, we propose a novel attack against LLMs, named prompt
stealing attacks. Our proposed prompt stealing attack aims to steal these
well-designed prompts based on the generated answers. The prompt stealing
attack contains two primary modules: the parameter extractor and the prompt
reconstruction. The goal of the parameter extractor is to figure out the
properties of the original prompts. We first observe that most prompts fall
into one of three categories: direct prompt, role-based prompt, and in-context
prompt. Our parameter extractor first tries to distinguish the type of prompts
based on the generated answers. Then, it can further predict which role or how
many contexts are used based on the types of prompts. Following the parameter
extractor, the prompt reconstructor can be used to reconstruct the original
prompts based on the generated answers and the extracted features. The final
goal of the prompt reconstructor is to generate the reversed prompts, which are
similar to the original prompts. Our experimental results show the remarkable
performance of our proposed attacks. Our proposed attacks add a new dimension
to the study of prompt engineering and call for more attention to the security
issues on LLMs.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined