Hybrid Retrieval-Augmented Generation for Real-time Composition Assistance
CoRR(2023)
Abstract
Retrieval augmentation enhances performance of traditional language models by
incorporating additional context. However, the computational demands for
retrieval augmented large language models (LLMs) pose a challenge when applying
them to real-time tasks, such as composition assistance. To address this
limitation, we propose the Hybrid Retrieval-Augmented Generation (HybridRAG)
framework, a novel approach that efficiently combines a cloud-based LLM with a
smaller, client-side, language model through retrieval augmented memory. This
integration enables the client model to generate effective responses,
benefiting from the LLM's capabilities and contextual information.
Additionally, through an asynchronous memory update mechanism, the client model
can deliver real-time completions swiftly to user inputs without the need to
wait for responses from the cloud. Our experiments on five benchmark datasets
demonstrate that HybridRAG significantly improves utility over client-only
models while maintaining low latency.
MoreTranslated text
Key words
composition,real-time real-time,retrieval-augmented
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined