Zero-Shot Predicate Prediction for Scene Graph Parsing.

IEEE Trans Multim(2023)

Cited 1|Views14
No score
The scene graph is a structured semantic representation of an image, which represents objects and relationships with vertices and edges, respectively. Since it is impossible to manually label all potential relationships in the real world, some previous methods try to apply the zero-shot method for scene graph generation. However, existing methods take triplet (i.e., (subject -predicate -object)) as the basic unit of a relationship. Each element (i.e., subject, predicate, or object) of the unseen relationship is actually seen in the training data. Therefore, they ignore the unseen predicate. To predict the unseen predicate, we introduce a novel task named zero-shot predicate prediction, which is crucial to extending existing scene graph generation methods to recognize more relationship classes. The new task is challenging and cannot be simply resolved through conventional zero-shot learning methods because there is a large intra-class variation of each predicate. Firstly, the large intra-class variation leads to the difficulty of computing the discriminative instance-level feature of the predicate class. Secondly, the large intra-class variation also brings more difficulties when knowledge is transferred from seen classes to unseen classes. For the first challenge, we propose distilling lexical knowledge of different objects and construct multi-modal representations of pairwise objects to reduce the intra-class variation of the predicate. To respond to the second challenge, we build a compact semantic space where the representations of unseen classes are reconstructed based on the seen classes for zero-shot predicate classification. We evaluate the proposed method on the public dataset Visual Genome. The extensive experiment results under the zero-shot/few-shot/supervised settings demonstrate the effectiveness of the proposed method.
Translated text
Key words
Deep learning, zero-shot, scene graph
AI Read Science
Must-Reading Tree
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined