Semantic Association Network for Video Corpus Moment Retrieval

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)(2022)

Cited 4|Views20
No score
This paper considers Semantic Association Network (SAN) for Video Corpus Moment Retrieval (VCMR) which localizes temporal moment that best corresponds to the given text query in a corpus of videos. Collaborations among common semantics from multi-modal inputs are essential for effectively understanding video together with subtitle and text query. For this collaboration, SAN associates common semantics within the same modality (by Intra Semantic Association) and across different modalities (by Inter Semantic Association) with dedicated module referred to as Modality Semantic Association (MSA). SAN surpasses existing state-of-the-art performance on the TVR and DiDeMo benchmark datasets. Extensive ablation studies and qualitative analyses show the effectiveness of the proposed model.
Translated text
Key words
Video Corpus Moment Retrieval,Video Moment Retrieval,Temporal Moment Localization,Localizing Moment,Vision Language Task
AI Read Science
Must-Reading Tree
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined