Multimodal feature fusion by relational reasoning and attention for visual question answering
Information Fusion(2020)
摘要
•The designed visual relational reasoning module can reason relationship between visual objects.•Bilinear visual attention together with bottom-up attention achieved discriminative features.•Jointly learning visual relation and attention for image regions contributed to multi-modal feature fusion.•The proposed visual question answering model achieved new state-of-the-art performance of single model on popular datasets.
更多查看译文
关键词
Multimodal fusion,Visual question answering,Visual relational reasoning,Attention mechanism
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要