Video Relation Detection via Multiple Hypothesis Association

MM '20: The 28th ACM International Conference on Multimedia Seattle WA USA October, 2020(2020)

引用 25|浏览117
暂无评分
摘要
Video visual relation detection (VidVRD) aims at obtaining not only the trajectories of objects but also the dynamic visual relations between them. It provides abundant information for video understanding and can serve as a bridge between vision and language. Compared with visual relation detection on image, VidVRD requires one more step at last called visual relation association which associates relation segments across time dimension into video relations. This step plays an important role in the task but is less studied. Nevertheless, visual relation association is a difficult task as the association process is easily affected by inaccurate tracklet detection and relation prediction in the former steps. In this paper, we propose a novel relation association method called Multiple Hypothesis Association (MHA). It maintains multiple possible relation hypothesis during the association process in order to tolerate and handle the inaccurate or missing problem in the former steps and generate more accurate video relations. Our experiments on the benchmark datasets (Imagenet-VidVRD and VidOR) show that our method outperforms the state-of-the-art methods.
更多
查看译文
关键词
video visual relation detection, data association
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要