Double-I Watermark: Protecting Model Copyright for LLM Fine-tuning
CoRR(2024)
摘要
To support various applications, business owners often seek the customized
models that are obtained by fine-tuning a pre-trained LLM through the API
provided by LLM owners or cloud servers. However, this process carries a
substantial risk of model misuse, potentially resulting in severe economic
consequences for business owners. Thus, safeguarding the copyright of these
customized models during LLM fine-tuning has become an urgent practical
requirement, but there are limited existing solutions to provide such
protection. To tackle this pressing issue, we propose a novel watermarking
approach named "Double-I watermark". Specifically, based on the instruct-tuning
data, two types of backdoor data paradigms are introduced with trigger in the
instruction and the input, respectively. By leveraging LLM's learning
capability to incorporate customized backdoor samples into the dataset, the
proposed approach effectively injects specific watermarking information into
the customized model during fine-tuning, which makes it easy to inject and
verify watermarks in commercial scenarios. We evaluate the proposed "Double-I
watermark" under various fine-tuning methods, demonstrating its harmlessness,
robustness, uniqueness, imperceptibility, and validity through both theoretical
analysis and experimental verification.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要