Overcoming Language Priors via Shuffling Language Bias for Robust Visual Question Answering

J. Zhao, Z. Yu, X. Zhang,Y. Yang

IEEE Access(2023)

引用 0|浏览7
暂无评分
摘要
Recent research has revealed the notorious language prior problem in visual question answering (VQA) tasks based on visual-textual interaction, which indicates that well-developed VQA models rely on learning shortcuts from questions without fully considering visual evidence. To tackle this problem, most existing methods focus on decreasing the incentive to learn prior knowledge by adding a question-only branch and becoming complacent by mechanically improving accuracy. However, these methods over-correct positive biases useful for generalization, leading to the degradation of performance on the VQA v2 dataset when cumulating their methods into other VQA architecture. In this paper, we propose a robust shuffling language bias (SLB) approach to explicitly balance the prediction distribution, hopefully alleviating the language prior by increasing training opportunities for VQA models. Experiment results demonstrate that our method is cumulative with data augmentation and large-scale pre-training VQA architectures and achieves competitive performance on both the in-domain benchmark VQA v2 and out-of-distribution benchmark VQA-CP v2.
更多
查看译文
关键词
Visual question answering,language prior,data balance,data augmentation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要