Improving Visual Reasoning With Attention Alignment

ADVANCES IN VISUAL COMPUTING, ISVC 2019, PT I(2020)

引用 0|浏览15
暂无评分
摘要
Since attention mechanisms were introduced, they have become an important component of neural network architectures. This is because they mimic how humans reason about visual stimuli by focusing on important parts of the input. In visual tasks like image captioning and visual question answering (VQA), networks can generate the correct answer or a comprehensible caption despite attending to wrong part of an image or text. This lack of synchronization between human and network attention hinders the model's ability to generalize. To improve human-like reasoning capabilities of the model, it is necessary to align what the network and a human will focus on, given the same input. We propose a mechanism to correct visual attention in the network by explicitly training the model to learn the salient parts of an image available in the VQA-HAT dataset. The results show an improvement in the visual question answering task across different types of questions.
更多
查看译文
关键词
Neural attention, Neural networks, Visual Question Answering (VQA), Supervised learning, Artificial intelligence
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要