Privacy in the Time of Language Models.

WSDM(2023)

引用 6|浏览57
暂无评分
摘要
Pretrained large language models (LLMs) have consistently shown state-of-the-art performance across multiple natural language processing (NLP) tasks. These models are of much interest for a variety of industrial applications that use NLP as a core component. However, LLMs have also been shown to memorize portions of their training data, which can contain private information. Therefore, when building and deploying LLMs, it is of value to apply privacy-preserving techniques that protect sensitive data. In this talk, we discuss privacy measurement and preservation techniques for LLMs that can be applied in the context of industrial applications and present case studies of preliminary solutions. We discuss select strategies and metrics relevant for measuring memorization in LLMs that can, in turn, be used to measure privacy-risk in these models. We then discuss privacy-preservation techniques that can be applied at different points of the LLM training life-cycle; including our work on an algorithm for fine-tuning LLMs with improved privacy. In addition, we discuss our work on privacy-preserving solutions that can be applied to LLMs during inference and are feasible for use at run time.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要