Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation
arxiv(2024)
摘要
Vision-and-language navigation (VLN) enables the agent to navigate to a
remote location following the natural language instruction in 3D environments.
At each navigation step, the agent selects from possible candidate locations
and then makes the move. For better navigation planning, the lookahead
exploration strategy aims to effectively evaluate the agent's next action by
accurately anticipating the future environment of candidate locations. To this
end, some existing works predict RGB images for future environments, while this
strategy suffers from image distortion and high computational cost. To address
these issues, we propose the pre-trained hierarchical neural radiance
representation model (HNR) to produce multi-level semantic features for future
environments, which are more robust and efficient than pixel-wise RGB
reconstruction. Furthermore, with the predicted future environmental
representations, our lookahead VLN model is able to construct the navigable
future path tree and select the optimal path via efficient parallel evaluation.
Extensive experiments on the VLN-CE datasets confirm the effectiveness of our
method.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要