UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection
arxiv(2024)
摘要
Temporal Action Detection (TAD) focuses on detecting pre-defined actions,
while Moment Retrieval (MR) aims to identify the events described by open-ended
natural language within untrimmed videos. Despite that they focus on different
events, we observe they have a significant connection. For instance, most
descriptions in MR involve multiple actions from TAD. In this paper, we aim to
investigate the potential synergy between TAD and MR. Firstly, we propose a
unified architecture, termed Unified Moment Detection (UniMD), for both TAD and
MR. It transforms the inputs of the two tasks, namely actions for TAD or events
for MR, into a common embedding space, and utilizes two novel query-dependent
decoders to generate a uniform output of classification score and temporal
segments. Secondly, we explore the efficacy of two task fusion learning
approaches, pre-training and co-training, in order to enhance the mutual
benefits between TAD and MR. Extensive experiments demonstrate that the
proposed task fusion learning scheme enables the two tasks to help each other
and outperform the separately trained counterparts. Impressively, UniMD
achieves state-of-the-art results on three paired datasets Ego4D, Charades-STA,
and ActivityNet. Our code will be released at
https://github.com/yingsen1/UniMD.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要