Boosting Scene Graph Generation with Contextual Information

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS(2024)

引用 0|浏览5
暂无评分
摘要
Scene graph generation (SGG) has been developed to detect objects and their relationships from the visual data and has attracted increasing attention in recent years. Existing works have focused on extracting object context for SGG. However, very few works have attempted to exploit implicit contextual correlations among relationships of the objects. Furthermore, most existing SGG schemes rely on high-level features to predict the predicates while overlooking the potential inherent association of low-level features with the object relationships. We present in this article a novel scheme to capture enhanced contextual information for both objects and relationships. We design a Dual-branch Context Analysis Transformer (DCAT) architecture to extract both object context and relationship context from the visual data with dual transformer branches and then effectively fuse both high-level and low-level features by an adaptive approach to facilitate relationship prediction. Specifically, we first conduct feature representation learning to enrich relation representations by the visual, spatial, and linguistic feature extractors. Next, two transformer branches are designed to leverage the modeling of global associative interaction and mine the hidden association among objects and relationships. Then, we devise a novel feature disentangling method to decouple contextualized high-level features with guidance from the visual semantics. Finally, we develop a refined attention module to perform low-level feature recalibration for the refinement of the final predicate prediction. Experiments on Visual Genome and Action Genome datasets demonstrate the effectiveness of DCAT for both image and video SGG settings. Moreover, we also test the quality of the generated image scene graphs to verify the generalizability on downstream tasks like sentence-to-graph retrieval and image retrieval.
更多
查看译文
关键词
Scene graph generation,Visual relationship detection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要