Fine-Grained Visual Categorization: A Spatial-Frequency Feature Fusion Perspective

Min Wang, Peng Zhao, Xin Lu,Fan Min,Xizhao Wang

IEEE Transactions on Circuits and Systems for Video Technology(2023)

引用 1|浏览40
暂无评分
摘要
Fine-grained visual categorization is a challenging issue owing to high intra-class and low inter-class variances. Classical approaches rely on pre-trained models or many fine annotations. In this paper, we observe that spatial and frequency information provides distinct image views, and propose a new spatial-frequency feature fusion (SFFF) perspective to handle this challenging issue. Specifically, we design a heterogeneous feature extraction loss function, construct a global and local fusion SFFF network, and propose an importance-sparsity selection strategy. For feature extraction, we focus on the frequency domain feature learning network, extract fine-grained features, and achieve feature complementarity. For feature selection, we propose importance ranking and sparse regularity to constrain spatial-frequency features. For feature fusion, we design a spatial-frequency loss and an inter-layer switching strategy to achieve local-global collaboration. Comparative experiments were performed on popular fine-grained datasets and classic datasets such as CUB200-2011, Stanford Cars, Stanford Dogs, FGVC-Aircraft, and CIFAR100. The effectiveness and outstanding performance of SFFF are confirmed by comparisons with more than 40 state-of-the-art fine-grained categorization methods. Ablation studies and visualizations are provided to facilitate an understanding of our approach.
更多
查看译文
关键词
Fine-grained recognition,frequency domain learning,deep fusion,weakly supervised learning,training from scratch
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要