Counterfactual Visual Dialog: Robust Commonsense Knowledge Learning From Unbiased Training

An-An Liu, Chenxi Huang, Ning Xu,Hongshuo Tian, Jing Liu,Yongdong Zhang

IEEE TRANSACTIONS ON MULTIMEDIA(2024)

引用 1|浏览2
暂无评分
摘要
Visual Dialog (VD) requires an agent to answer the current question by engaging in a conversation with humans referring to an image. Despite the recent progress, it is beneficial to introduce external commonsense knowledge to fully understand the given image and dialog history. However, the existing knowledge-based VD models are inclined to rely on severe learning bias brought by commonsense, e.g., the retrieved < bus, capable of, transport people > , < bus ,is a ,public transport > , and < bus ,is a, car > can induce a spurious correlation between the question "What is the bus used for?" and the false answer "City bus". There are two challenges to make commonsense learning more robust against spurious correlations: 1) how to disentangle the true effect of "good" commonsense knowledge from the whole, and 2) how to estimate and remove the effect of "bad" commonsense bias on answers. In this article, we propose a novel CounterFactual Commonsense learning scheme for the Visual Dialog task (CFC-VD). First, comparing with the causal graph of existing VD models, we add one new commonsense node and one new link to multi-modal information from history, question, and image. Since the retrieved knowledge prior is subtle and uncontrollable, we consider it as an unobserved confounder in the commonsense node, which leads to spurious correlations for the answer inference. Then, to remove the effect of the confounder, we formulate it as the direct causal effect of commonsense on answers and remove the direct language effect by subtracting it from the total causal effect via counterfactual reasoning. Experimental results certify the effectiveness of our method on the prevailing Visdial v0.9 and Visdial v1.0 datasets.
更多
查看译文
关键词
Visualization,Commonsense reasoning,History,Task analysis,Correlation,Knowledge based systems,Computational modeling,Visual dialog,commonsense,multi-modal,counterfactual
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要