Category-specific Semantic Coherency Learning for Fine-grained Image Recognition

MM '20: The 28th ACM International Conference on Multimedia Seattle WA USA October, 2020(2020)

引用 18|浏览243
暂无评分
摘要
Existing deep learning based weakly supervised fine-grained image recognition (WFGIR) methods usually pick out the discriminative regions from the high-level feature (HLF) maps directly. However, as HLF maps are derived based on spatial aggregation of convolution which is basically a pattern matching process that applies fixed filters, it is ineffective to model visual contents of same semantic but varying posture or perspective. We argue that this will cause the selected discriminative regions of same sub-category are not semantically corresponding and thus degrade the WFGIR performance. To address this issue, we propose an end-to-end Category-specific Semantic Coherency Network (CSC-Net) to semantically align the discriminative regions of the same subcategory. Specifically, CSC-Net consists of: 1) Local-to-Attribute Projecting Module (LPM), which automatically learns a set of latent attributes via collecting the category-specific semantic details while eliminating the varying spatial distributions from the local regions. 2) Latent Attribute Aligning (LAA), which aligns the latent attributes to specific semantic via graph convolution based on their discriminability, to achieve category-specific semantic coherency; 3) Attribute-to-Local Resuming Module (ARM), which resumes the original Euclidean space of latent attributes and construct latent attribute aligned feature maps by a location-embedding graph unpooling operation. Finally, the new feature maps are used which applies the category-specific semantic coherency implicitly for more accurate discriminative regions localization. Extensive experiments verify that CSC-Net yields the best performance under the same settings with most competitive approaches, on CUB Bird, Stanford-Cars, and FGVC Aircraft datasets.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要