EffLoc: Lightweight Vision Transformer for Efficient 6-DOF Camera Relocalization
CoRR(2024)
摘要
Camera relocalization is pivotal in computer vision, with applications in AR,
drones, robotics, and autonomous driving. It estimates 3D camera position and
orientation (6-DoF) from images. Unlike traditional methods like SLAM, recent
strides use deep learning for direct end-to-end pose estimation. We propose
EffLoc, a novel efficient Vision Transformer for single-image camera
relocalization. EffLoc's hierarchical layout, memory-bound self-attention, and
feed-forward layers boost memory efficiency and inter-channel communication.
Our introduced sequential group attention (SGA) module enhances computational
efficiency by diversifying input features, reducing redundancy, and expanding
model capacity. EffLoc excels in efficiency and accuracy, outperforming prior
methods, such as AtLoc and MapNet. It thrives on large-scale outdoor
car-driving scenario, ensuring simplicity, end-to-end trainability, and
eliminating handcrafted loss functions.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要