Aligning Knowledge Graph with Visual Perception for Object-goal Navigation
CoRR(2024)
摘要
Object-goal navigation is a challenging task that requires guiding an agent
to specific objects based on first-person visual observations. The ability of
agent to comprehend its surroundings plays a crucial role in achieving
successful object finding. However, existing knowledge-graph-based navigators
often rely on discrete categorical one-hot vectors and vote counting strategy
to construct graph representation of the scenes, which results in misalignment
with visual images. To provide more accurate and coherent scene descriptions
and address this misalignment issue, we propose the Aligning Knowledge Graph
with Visual Perception (AKGVP) method for object-goal navigation. Technically,
our approach introduces continuous modeling of the hierarchical scene
architecture and leverages visual-language pre-training to align natural
language description with visual perception. The integration of a continuous
knowledge graph architecture and multimodal feature alignment empowers the
navigator with a remarkable zero-shot navigation capability. We extensively
evaluate our method using the AI2-THOR simulator and conduct a series of
experiments to demonstrate the effectiveness and efficiency of our navigator.
Code available: https://github.com/nuoxu/AKGVP.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要