A question-guided multi-hop reasoning graph network for visual question answering

Information Processing & Management(2023)

引用 2|浏览42
暂无评分
摘要
Visual Question Answering (VQA) requires reasoning about the visually-grounded relations in the image and question context. A crucial aspect of solving complex questions is reliable multi-hop reasoning, i.e., dynamically learning the interplay between visual entities in each step. In this paper, we investigate the potential of the reasoning graph network on multi-hop reasoning questions, especially over 3 “hops.” We call this model QMRGT: A Question-Guided Multi-hop Reasoning Graph Network. It constructs a cross-modal interaction module (CIM) and a multi-hop reasoning graph network (MRGT) and infers an answer by dynamically updating the inter-associated instruction between two modalities. Our graph reasoning module can apply to any multi-modal model. The experiments on VQA 2.0 and GQA (in fully supervised and O.O.D settings) datasets show that both QMRGT and pre-training V&L models+MRGT lead to improvement on visual question answering tasks. Graph-based multi-hop reasoning provides an effective signal for the visual question answering challenge, both for the O.O.D and high-level reasoning questions.
更多
查看译文
关键词
Visual question answering,Multi-hop reasoning,Reasoning graph network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要