Loss Re-Scaling VQA: Revisiting the Language Prior Problem From a Class-Imbalance View

Yangyang Guo,Liqiang Nie,Zhiyong Cheng,Qi Tian

IEEE Transactions on Image Processing（2022）

引用 53|浏览264

暂无评分

摘要

Recent studies have pointed out that many well-developed Visual Question Answering (VQA) models are heavily affected by the language prior problem. It refers to making predictions based on the co-occurrence pattern between textual questions and answers instead of reasoning upon visual contents. To tackle this problem, most existing methods focus on strengthening the visual feature learning capability to reduce this text shortcut influence on model decisions. However, few efforts have been devoted to analyzing its inherent cause and providing an explicit interpretation. It thus lacks a good guidance for the research community to move forward in a purposeful way, resulting in model construction perplexity towards overcoming this non-trivial problem. In this paper, we propose to interpret the language prior problem in VQA from a class-imbalance view. Concretely, we design a novel interpretation scheme whereby the loss of mis-predicted frequent and sparse answers from the same question type is distinctly exhibited during the late training phase. It explicitly reveals why the VQA model tends to produce a frequent yet obviously wrong answer, to a given question whose right answer is sparse in the training set. Based upon this observation, we further propose a novel loss re-scaling approach to assign different weights to each answer according to the training data statistics for estimating the final loss. We apply our approach into six strong baselines and the experimental results on two VQA-CP benchmark datasets evidently demonstrate its effectiveness. In addition, we also justify the validity of the class imbalance interpretation scheme on other computer vision tasks, such as face recognition and image classification.

查看译文

关键词

Visual question answering,language prior problem,class imbalance,loss re-scaling

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要