DIML: Deep Interpretable Metric Learning via Structural Matching.

IEEE transactions on pattern analysis and machine intelligence(2024)

引用 0|浏览12
暂无评分
摘要
In this paper, we present a new framework named DIML to achieve more interpretable deep metric learning. Unlike traditional deep metric learning method that simply produces a global similarity given two images, DIML computes the overall similarity through the weighted sum of multiple local part-wise similarities, making it easier for human to understand the mechanism of how the model distinguish two images. Specifically, we propose a structural matching strategy that explicitly aligns the spatial embeddings by computing an optimal matching flow between feature maps of the two images. We also devise a multi-scale matching strategy, which considers both global and local similarities and can significantly reduce the computational costs in the application of image retrieval. To handle the view variance in some complicated scenarios, we propose to use cross-correlation as the marginal distribution of the optimal transport to leverage semantic information to locate the important region in the images. Our framework is model-agnostic, which can be applied to off-the-shelf backbone networks and metric learning methods. To extend our DIML to more advanced architectures like vision Transformers (ViTs), we further propose truncated attention rollout and partial similarity to overcome the lack of locality in ViTs. We evaluate our method on three major benchmarks of deep metric learning including CUB200-2011, Cars196, and Stanford Online Products, and achieve substantial improvements over popular metric learning methods with better interpretability.
更多
查看译文
关键词
Distance metric learning,interpretable AI,visual recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要