Prompt Perturbation in Retrieval-Augmented Generation based Large Language Models
CoRR(2024)
摘要
The robustness of large language models (LLMs) becomes increasingly important
as their use rapidly grows in a wide range of domains. Retrieval-Augmented
Generation (RAG) is considered as a means to improve the trustworthiness of
text generation from LLMs. However, how the outputs from RAG-based LLMs are
affected by slightly different inputs is not well studied. In this work, we
find that the insertion of even a short prefix to the prompt leads to the
generation of outputs far away from factually correct answers. We
systematically evaluate the effect of such prefixes on RAG by introducing a
novel optimization technique called Gradient Guided Prompt Perturbation (GGPP).
GGPP achieves a high success rate in steering outputs of RAG-based LLMs to
targeted wrong answers. It can also cope with instructions in the prompts
requesting to ignore irrelevant context. We also exploit LLMs' neuron
activation difference between prompts with and without GGPP perturbations to
give a method that improves the robustness of RAG-based LLMs through a highly
effective detector trained on neuron activation triggered by GGPP generated
prompts. Our evaluation on open-sourced LLMs demonstrates the effectiveness of
our methods.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要