Bi Modal Progressive Mask Attention for Fine Grained Recognition

IEEE Transactions on Image Processing(2020)

引用 51|浏览45
暂无评分
摘要
Traditional fine-grained image recognition is required to distinguish different subordinate categories (e.g., birds species) based on the visual cues beneath raw images. Due to both small inter-class variations and large intra-class variations, it is desirable to capture the subtle differences between these sub-categories, which is crucial but challenging for fine-grained recognition. Recently, language modality aggregation has been proved as a successful technique to improve visual recognition in the experience. In this paper, we introduce an end-to-end trainable Progressive Mask Attention (PMA) model for finegrained recognition by leveraging both visual and language modalities. Our Bi-Modal PMA model can not only stageby- stage capture the most discriminative part in the visual modality by our mask-based fashion, but also explore the outof- visual-domain knowledge from the language modality in an interactional …
更多
查看译文
关键词
Visualization, Image recognition, Feature extraction, Annotations, Task analysis, Semantics, Streaming media, Fine-grained visual recognition, multi-modal analysis, deep neural networks, language modality
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要