CoBra: Complementary Branch Fusing Class and Semantic Knowledge for Robust Weakly Supervised Semantic Segmentation
arxiv(2024)
摘要
Leveraging semantically precise pseudo masks derived from image-level class
knowledge for segmentation, namely image-level Weakly Supervised Semantic
Segmentation (WSSS), still remains challenging. While Class Activation Maps
(CAMs) using CNNs have steadily been contributing to the success of WSSS, the
resulting activation maps often narrowly focus on class-specific parts (e.g.,
only face of human). On the other hand, recent works based on vision
transformers (ViT) have shown promising results based on their self-attention
mechanism to capture the semantic parts but fail in capturing complete
class-specific details (e.g., entire body parts of human but also with a dog
nearby). In this work, we propose Complementary Branch (CoBra), a novel dual
branch framework consisting of two distinct architectures which provide
valuable complementary knowledge of class (from CNN) and semantic (from ViT) to
each branch. In particular, we learn Class-Aware Projection (CAP) for the CNN
branch and Semantic-Aware Projection (SAP) for the ViT branch to explicitly
fuse their complementary knowledge and facilitate a new type of extra
patch-level supervision. Our model, through CoBra, fuses CNN and ViT's
complementary outputs to create robust pseudo masks that integrate both class
and semantic information effectively. Extensive experiments qualitatively and
quantitatively investigate how CNN and ViT complement each other on the PASCAL
VOC 2012 dataset, showing a state-of-the-art WSSS result. This includes not
only the masks generated by our model, but also the segmentation results
derived from utilizing these masks as pseudo labels.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要