Multi-view Aggregation Network for Dichotomous Image Segmentation
CVPR 2024(2024)
摘要
Dichotomous Image Segmentation (DIS) has recently emerged towards
high-precision object segmentation from high-resolution natural images.
When designing an effective DIS model, the main challenge is how to balance
the semantic dispersion of high-resolution targets in the small receptive field
and the loss of high-precision details in the large receptive field. Existing
methods rely on tedious multiple encoder-decoder streams and stages to
gradually complete the global localization and local refinement.
Human visual system captures regions of interest by observing them from
multiple views. Inspired by it, we model DIS as a multi-view object perception
problem and provide a parsimonious multi-view aggregation network (MVANet),
which unifies the feature fusion of the distant view and close-up view into a
single stream with one encoder-decoder structure. With the help of the proposed
multi-view complementary localization and refinement modules, our approach
established long-range, profound visual interactions across multiple views,
allowing the features of the detailed close-up view to focus on highly slender
structures.Experiments on the popular DIS-5K dataset show that our MVANet
significantly outperforms state-of-the-art methods in both accuracy and speed.
The source code and datasets will be publicly available at
\href{https://github.com/qianyu-dlut/MVANet}{MVANet}.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要