Robust and Interpretable Grounding of Spatial References with Relation Networks

arxiv(2020)

引用 11|浏览0
暂无评分
摘要
Handling spatial references in natural language is a key challenge in tasks like autonomous navigation and robotic manipulation. Recent work has investigated various neural architectures for learning multi-modal representations of spatial concepts that generalize well across a variety of observations and text instructions. In this work, we develop accurate models for understanding spatial references in text that are also robust and interpretable. We design a text-conditioned relation network whose parameters are dynamically computed with a cross-modal attention module to capture fine-grained spatial relations between entities. Our experiments across three different prediction tasks demonstrate the effectiveness of our model compared to existing state-of-the-art systems. Our model is robust to both observational and instructional noise, and lends itself to easy interpretation through visualization of intermediate outputs.
更多
查看译文
关键词
spatial references,interpretable grounding,relation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要