CoGenesis: A Framework Collaborating Large and Small Language Models for Secure Context-Aware Instruction Following
arxiv(2024)
摘要
With the advancement of language models (LMs), their exposure to private data
is increasingly inevitable, and their deployment (especially for smaller ones)
on personal devices, such as PCs and smartphones, has become a prevailing
trend. In contexts laden with user information, enabling models to both
safeguard user privacy and execute commands efficiently emerges as an essential
research imperative. In this paper, we propose CoGenesis, a collaborative
generation framework integrating large (hosted on cloud infrastructure) and
small models (deployed on local devices) to address privacy concerns logically.
Initially, we design a pipeline to create personalized writing instruction
datasets enriched with extensive context details as the testbed of this
research issue. Subsequently, we introduce two variants of CoGenesis based on
sketch and logits respectively. Our experimental findings, based on our
synthesized dataset and two additional open-source datasets, indicate that: 1)
Large-scale models perform well when provided with user context but struggle in
the absence of such context. 2) While specialized smaller models fine-tuned on
the synthetic dataset show promise, they still lag behind their larger
counterparts. 3) Our CoGenesis framework, utilizing mixed-scale models,
showcases competitive performance, providing a feasible solution to privacy
issues.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要