Hierarchical Graph Pattern Understanding for Zero-Shot VOS
CoRR(2023)
摘要
The optical flow guidance strategy is ideal for obtaining motion information
of objects in the video. It is widely utilized in video segmentation tasks.
However, existing optical flow-based methods have a significant dependency on
optical flow, which results in poor performance when the optical flow
estimation fails for a particular scene. The temporal consistency provided by
the optical flow could be effectively supplemented by modeling in a structural
form. This paper proposes a new hierarchical graph neural network (GNN)
architecture, dubbed hierarchical graph pattern understanding (HGPU), for
zero-shot video object segmentation (ZS-VOS). Inspired by the strong ability of
GNNs in capturing structural relations, HGPU innovatively leverages motion cues
(\ie, optical flow) to enhance the high-order representations from the
neighbors of target frames. Specifically, a hierarchical graph pattern encoder
with message aggregation is introduced to acquire different levels of motion
and appearance features in a sequential manner. Furthermore, a decoder is
designed for hierarchically parsing and understanding the transformed
multi-modal contexts to achieve more accurate and robust results. HGPU achieves
state-of-the-art performance on four publicly available benchmarks (DAVIS-16,
YouTube-Objects, Long-Videos and DAVIS-17). Code and pre-trained model can be
found at \url{https://github.com/NUST-Machine-Intelligence-Laboratory/HGPU}.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要