FMAP: Learning robust and accurate local feature matching with anchor points

EXPERT SYSTEMS WITH APPLICATIONS(2024)

引用 0|浏览5
暂无评分
摘要
Local feature matching involves the task of establishing the pixel-wise correspondences between a pair of images. As an integral component of plentiful computer vision applications (e.g., visual localization), this task has been successfully performed using Transformer-based methods. However, these methods typically extract numerous keypoints from sparse texture regions to construct a densely-connected graph neural network (GNN) for long-range feature aggregation, which inevitably triggers redundant message exchange and hampers the learning process. Furthermore, they employ transformer encoder layers that consider images as 1D sequences, leaving them incapable of extracting multiscale local structural information from the images, which is critical for establishing correspondence in image pairs with significantly scales shifts. In this study, we develop FMAP, an innovative detector-free approach that enables accurate local feature matching. For the first issue, FMAP employs an anchor points feature aggregation module (APAM) that captures representative keypoints and discards the extraneous keypoints to build a sparsified GNN for compact yet clean message exchange, with the key insight that the keypoints containing abundant visual information are distinguishable from their neighbors. For the second issue, FMAP proposes a global-local multiscale perception module (GMPM), which incorporates abundant multiscale local context information into global feature representation by employing multiple depth-wise convolutions with varying kernel sizes, thereby generating discriminative features that are robust to scale shifts. In addition, the depth-wise convolutions are utilized in the feed-forward network of the GMPM to further fuse the global context information and local feature representation. Extensive experiments on several standard benchmarks demonstrate that the proposed FMAP method significantly outperforms state-of-the-art methods. Compared to the cutting-edge methods MatchFormer, QuadTree, and TopicFM in relative pose estimation task, FMAP surpasses them by 2.27%, 0.58%, and 1.08% in terms of AUC@5 degrees. Besides, FMAP noticeably outperforms the baseline LoFTR by (2.38%, 1.89%, 1.45%) in terms of AUC@(5 degrees, 10 degrees, 20 degrees). Moreover, we integrate FMAP into an official visual localization framework and conduct a visual localization experiment, with the results showing that FMAP exceeds LoFTR by 2.3% in terms of AP.
更多
查看译文
关键词
Local feature matching,Anchor points,Transformer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要