Is it Really Negative? Evaluating Natural Language Video Localization Performance on Multiple Reliable Videos Pool
arxiv(2023)
摘要
With the explosion of multimedia content in recent years, Video Corpus Moment
Retrieval (VCMR), which aims to detect a video moment that matches a given
natural language query from multiple videos, has become a critical problem.
However, existing VCMR studies have a significant limitation since they have
regarded all videos not paired with a specific query as negative, neglecting
the possibility of including false negatives when constructing the negative
video set. In this paper, we propose an MVMR (Massive Videos Moment Retrieval)
task that aims to localize video frames within a massive video set, mitigating
the possibility of falsely distinguishing positive and negative videos. For
this task, we suggest an automatic dataset construction framework by employing
textual and visual semantic matching evaluation methods on the existing video
moment search datasets and introduce three MVMR datasets. To solve MVMR task,
we further propose a strong method, CroCs, which employs cross-directional
contrastive learning that selectively identifies the reliable and informative
negatives, enhancing the robustness of a model on MVMR task. Experimental
results on the introduced datasets reveal that existing video moment search
models are easily distracted by negative video frames, whereas our model shows
significant performance.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要