PreCurious: How Innocent Pre-Trained Language Models Turn into Privacy Traps
arxiv(2024)
摘要
The pre-training and fine-tuning paradigm has demonstrated its effectiveness
and has become the standard approach for tailoring language models to various
tasks. Currently, community-based platforms offer easy access to various
pre-trained models, as anyone can publish without strict validation processes.
However, a released pre-trained model can be a privacy trap for fine-tuning
datasets if it is carefully designed. In this work, we propose PreCurious
framework to reveal the new attack surface where the attacker releases the
pre-trained model and gets a black-box access to the final fine-tuned model.
PreCurious aims to escalate the general privacy risk of both membership
inference and data extraction. The key intuition behind PreCurious is to
manipulate the memorization stage of the pre-trained model and guide
fine-tuning with a seemingly legitimate configuration. The effectiveness of
defending against privacy attacks on a fine-tuned model seems promising, as
empirical and theoretical evidence suggests that parameter-efficient and
differentially private fine-tuning techniques are invulnerable to privacy
attacks. But PreCurious demonstrates the possibility of breaking up
invulnerability in a stealthy manner compared to fine-tuning on a benign model.
By further leveraging a sanitized dataset, PreCurious can extract originally
unexposed secrets under differentially private fine-tuning. Thus, PreCurious
raises warnings for users who download pre-trained models from unknown sources,
rely solely on tutorials or common-sense defenses, and previously release
sanitized datasets even after perfect scrubbing.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要