SDGIN: Structure-aware dual-level graph interactive network with semantic roles for visual dialog

KNOWLEDGE-BASED SYSTEMS(2024)

引用 0|浏览0
暂无评分
摘要
Visual Dialog aims to answer an appropriate response based on a multi-round dialog history and a given image. Existing methods either focus on semantic interaction, or implicitly capture coarse-grained structural interaction (e.g., pronoun co-references). The fine-grained and explicit structural interaction feature for dialog history is seldom explored, resulting in insufficient feature learning and difficulty in capturing precise context. To address these issues, we propose a structure-aware dual-level graph interactive network (SDGIN) that integrates verb-specific semantic roles and co-reference resolution to explicitly capture context structural features for discriminative and generative tasks in visual dialog. Specifically, we create a novel structural interaction graph that injects syntactic knowledge priors into dialog by introducing semantic role labeling that imply which words are sentence stems. Furthermore, considering the single perspective limitation of previous algorithms, we design a dual-perspective mechanism that learns fine-grained token-level context structure features and coarse-grained utterance-level interactions in parallel. It possess an elegant view to explore precise context interactions, realizing the mutual complementation and enhancement of different granularity features. Experimental results show the superiority of our proposed approach. Compared to other task-specific models, our SDGIN outperforms previous models and achieves a significant improvement on the benchmark dataset VisDial v1.0.
更多
查看译文
关键词
Visual Dialog,Context structural reasoning,Dual-level graph interactive network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要