Semantics Guided Contrastive Learning of Transformers for Zero-shot Temporal Activity Detection

Sayak Nag,Orpaz Goldstein,Amit K. Roy-Chowdhury

WACV（2023）

引用 0|浏览14

暂无评分

摘要

Zero-shot temporal activity detection (ZSTAD) is the problem of simultaneous temporal localization and classification of activity segments that are previously unseen during training. This is achieved by transferring the knowledge learned from semantically-related seen activities. This ability to reason about unseen concepts without supervision makes ZSTAD very promising for applications where the acquisition of annotated training videos is difficult. In this paper, we design a transformer-based framework titled TranZAD, which streamlines the detection of unseen activities by casting ZSTAD as a direct set-prediction problem, removing the need for hand-crafted designs and manual post-processing. We show how a semantic information-guided contrastive learning strategy can effectively train TranZAD for the zero-shot setting, enabling the efficient transfer of knowledge from the seen to the unseen activities. To reduce confusion between unseen activities and unrelated background information in videos, we introduce a more efficient method of computing the background class embedding by dynamically adapting it as part of the end-to-end learning. Additionally, unlike existing work on ZSTAD, we do not assume the knowledge of which classes are unseen during training and use the visual and semantic information of only the seen classes for the knowledge transfer. This makes TranZAD more viable for practical scenarios, which we evaluate by conducting extensive experiments on Thumos'14 and Charades.

查看译文

关键词

activity,transformers,learning,detection,temporal,zero-shot

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要