Weakly-Supervised Video Object Grounding by Exploring Spatio-Temporal Contexts
MM '20: The 28th ACM International Conference on Multimedia Seattle WA USA October, 2020, pp. 1939-1947, 2020.
Grounding objects in visual context from natural language queries is a crucial yet challenging vision-and-language task, which has gained increasing attention in recent years. Existing work has primarily investigated this task in the context of still images. Despite their effectiveness, these methods cannot be directly migrated into the v...More
PPT (Upload PPT)