Against The Achilles' Heel: A Survey on Red Teaming for Generative Models
arxiv(2024)
摘要
Generative models are rapidly gaining popularity and being integrated into
everyday applications, raising concerns over their safety issues as various
vulnerabilities are exposed. Faced with the problem, the field of red teaming
is experiencing fast-paced growth, which highlights the need for a
comprehensive organization covering the entire pipeline and addressing emerging
topics for the community. Our extensive survey, which examines over 120 papers,
introduces a taxonomy of fine-grained attack strategies grounded in the
inherent capabilities of language models. Additionally, we have developed the
searcher framework that unifies various automatic red teaming approaches.
Moreover, our survey covers novel areas including multimodal attacks and
defenses, risks around multilingual models, overkill of harmless queries, and
safety of downstream applications. We hope this survey can provide a systematic
perspective on the field and unlock new areas of research.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要