Retrieving Multimodal Information for Augmented Generation: A Survey

Ruochen Zhao, Hailin Chen,Weishi Wang,Fangkai Jiao,Xuan Long Do,Chengwei Qin, Bosheng Ding,Xiaobao Guo, Minzhi Li, Xingxuan Li,Shafiq Joty

arxiv(2023)

引用 4|浏览31
暂无评分
摘要
In this survey, we review methods that retrieve multimodal knowledge to assist and augment generative models. This group of works focuses on retrieving grounding contexts from external sources, including images, codes, tables, graphs, and audio. As multimodal learning and generative AI have become more and more impactful, such retrieval augmentation offers a promising solution to important concerns such as factuality, reasoning, interpretability, and robustness. We provide an in-depth review of retrieval-augmented generation in different modalities and discuss potential future directions. As this is an emerging field, we continue to add new papers and methods.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络