Dynamic Position-Aware Network For Fine-Grained Image Recognition

THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE(2021)

引用 24|浏览126
暂无评分
摘要
Most weakly supervised fine-grained image recognition (WF-GIR) approaches predominantly focus on learning the discriminative details which contain the visual variances and position clues. The position clues can be indirectly learnt by utilizing context information of discriminative visual content. However, this will cause the selected discriminative regions containing some non-discriminative information introduced by the position clues. These analysis motivates us to directly introduce position clues into visual content to only focus on the visual variances, achieving more precise discriminative region localization. Though important, position modelling usually requires significant pixel/region annotations and therefore is labor-intensive. To address this issue, we propose an end-to-end Dynamic Position-aware Network (DP-Net) to directly incorporate the position clues into visual content and dynamically align them without extra annotations, which eliminates the effect of position information for discriminative variances among subcategories. In particular, the DP-Net consists of: 1) Position Encoding Module, which learns a set of position-aware parts by directly adding the learnable position information into the horizontal/vertical visual content of images; 2) Position-vision Aligning Module, which dynamically aligns both visual content and learnable position information via performing graph convolution on position-aware parts; 3) Position-vision Reorganization Module, which projects the aligned position clues and visual content into the Euclidean space to construct a position-aware feature maps. Finally, the position-aware feature maps are used which is implicitly applied the aligned visual content and position clues for more accurate discriminative regions localization. Extensive experiments verify that DP-Net yields the best performance under the same settings with most competitive approaches, on CUB Bird, Stanford-Cars, and FGVC Aircraft datasets.
更多
查看译文
关键词
recognition,network,image,position-aware,fine-grained
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要