STRONG: Spatio-Temporal Reinforcement Learning for Cross-Modal Video Moment Localization
MM '20: The 28th ACM International Conference on Multimedia Seattle WA USA October, 2020, pp. 4162-4170, 2020.
In this article, we tackle the cross-modal video moment localization issue, namely, localizing the most relevant video moment in an untrimmed video given a sentence as the query. The majority of existing methods focus on generating video moment candidates with the help of multi-scale sliding window segmentation. They hence inevitably suff...More
PPT (Upload PPT)