Consensus Feature Network for Scene Parsing.

IEEE Transactions on Multimedia(2022)

引用 8|浏览134
暂无评分
摘要
Scene parsing is challenging as it aims to assign one of the semantic categories to each pixel in scene images. Thus, pixel-level features are desired for scene parsing. However, classification networks are dominated by the discriminative portion, so directly applying classification networks to scene parsing will result in inconsistent parsing predictions within one instance and among instances of the same category. To address this problem, we propose two transform units to learn pixel-level consensus features. One is an Instance Consensus Transform (ICT) unit to learn the instance-level consensus features by aggregating features within the same instance. The other is a Category Consensus Transform (CCT) unit to pursue category-level consensus features through keeping the consensus of features among instances of the same category in scene images. The proposed ICT and CCT units are lightweight, data-driven and end-to-end trainable. The features learned by the two units are more coherent in both instance-level and category-level. Furthermore, we present the Consensus Feature Network (CFNet) based on the proposed ICT and CCT units. Experiments on four scene parsing benchmarks, including Cityscapes, Pascal Context, CamVid, and COCO Stuff, show that the proposed CFNet learns pixel-level consensus feature and obtain consistent parsing results.
更多
查看译文
关键词
Transforms,Semantics,Convolution,Feature extraction,Training,Network architecture,Information and communication technology,Scene Parsing,Instance Consensus Transform,Category Consensus Transform
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要