Hybrid Retrieval-Augmented Generation for Real-time Composition Assistance
CoRR(2023)
摘要
Retrieval augmentation enhances performance of traditional language models by
incorporating additional context. However, the computational demands for
retrieval augmented large language models (LLMs) pose a challenge when applying
them to real-time tasks, such as composition assistance. To address this
limitation, we propose the Hybrid Retrieval-Augmented Generation (HybridRAG)
framework, a novel approach that efficiently combines a cloud-based LLM with a
smaller, client-side, language model through retrieval augmented memory. This
integration enables the client model to generate effective responses,
benefiting from the LLM's capabilities and contextual information.
Additionally, through an asynchronous memory update mechanism, the client model
can deliver real-time completions swiftly to user inputs without the need to
wait for responses from the cloud. Our experiments on five benchmark datasets
demonstrate that HybridRAG significantly improves utility over client-only
models while maintaining low latency.
更多查看译文
关键词
composition,real-time real-time,retrieval-augmented
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要