Video Object Linguistic Grounding

Alba Herrera-Palacio,Carles Ventura,Xavier Giro-i-Nieto

1st International Workshop on Multimodal Understanding and Learning for Embodied Applications(2019)

引用 1|浏览12
暂无评分
摘要
The goal of this work is segmenting on a video sequence the objects which are mentioned in a linguistic description of the scene. We have adapted an existing deep neural network that achieves state of the art performance in semi-supervised video object segmentation, to add a linguistic branch that would generate an attention map over the video frames, making the segmentation of the objects temporally consistent along the sequence.
更多
查看译文
关键词
linguistics, neural networks, video object gounding
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要