Visual Question Generation as Dual Task of Visual Question Answering

Yikang Li,Nan Duan,Bolei Zhou,Xiao Chu,Wanli Ouyang,Xiaogang Wang,Ming Zhou

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition（2018）

引用 161|浏览102

暂无评分

摘要

Visual question answering (VQA) and visual question generation (VQG) are two trending topics in the computer vision, but they are usually explored separately despite their intrinsic complementary relationship. In this paper, we propose an end-to-end unified model, the Invertible Question Answering Network (iQAN), to introduce question generation as a dual task of question answering to improve the VQA performance. With our proposed invertible bilinear fusion module and parameter sharing scheme, our iQAN can accomplish VQA and its dual task VQG simultaneously. By jointly trained on two tasks with our proposed dual regularizers (termed as Dual Training), our model has a better understanding of the interactions among images, questions and answers. After training, iQAN can take either question or answer as input, and output the counterpart. Evaluated on the CLEVR and VQA2 datasets, our iQAN improves the top-1 accuracy of the prior art MUTAN VQA method by 1.33% and 0.88% (absolute increase) respectiely. We also show that our proposed dual training framework can consistently improve model performances of many popular VQA architectures.

查看译文

关键词

visual question generation,visual question answering,end-to-end unified model,iQAN,VQA performance,invertible bilinear fusion module,parameter sharing scheme,dual task VQG,dual regularizes,VQA2 datasets,dual training framework,invertible question answering network,MUTAN VQA method

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要