A Statistical Framework of Watermarks for Large Language Models: Pivot, Detection Efficiency and Optimal Rules
arxiv(2024)
摘要
Since ChatGPT was introduced in November 2022, embedding (nearly)
unnoticeable statistical signals into text generated by large language models
(LLMs), also known as watermarking, has been used as a principled approach to
provable detection of LLM-generated text from its human-written counterpart. In
this paper, we introduce a general and flexible framework for reasoning about
the statistical efficiency of watermarks and designing powerful detection
rules. Inspired by the hypothesis testing formulation of watermark detection,
our framework starts by selecting a pivotal statistic of the text and a secret
key – provided by the LLM to the verifier – to enable controlling the false
positive rate (the error of mistakenly detecting human-written text as
LLM-generated). Next, this framework allows one to evaluate the power of
watermark detection rules by obtaining a closed-form expression of the
asymptotic false negative rate (the error of incorrectly classifying
LLM-generated text as human-written). Our framework further reduces the problem
of determining the optimal detection rule to solving a minimax optimization
program. We apply this framework to two representative watermarks – one of
which has been internally implemented at OpenAI – and obtain several findings
that can be instrumental in guiding the practice of implementing watermarks. In
particular, we derive optimal detection rules for these watermarks under our
framework. These theoretically derived detection rules are demonstrated to be
competitive and sometimes enjoy a higher power than existing detection
approaches through numerical experiments.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要