Stumbling Blocks: Stress Testing the Robustness of Machine-Generated Text Detectors Under Attacks
CoRR(2024)
摘要
The widespread use of large language models (LLMs) is increasing the demand
for methods that detect machine-generated text to prevent misuse. The goal of
our study is to stress test the detectors' robustness to malicious attacks
under realistic scenarios. We comprehensively study the robustness of popular
machine-generated text detectors under attacks from diverse categories:
editing, paraphrasing, prompting, and co-generating. Our attacks assume limited
access to the generator LLMs, and we compare the performance of detectors on
different attacks under different budget levels. Our experiments reveal that
almost none of the existing detectors remain robust under all the attacks, and
all detectors exhibit different loopholes. Averaging all detectors, the
performance drops by 35
reasons behind these defects and propose initial out-of-the-box patches to
improve robustness.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要