Hierarchical Vision-Language Alignment for Video Captioning
MMM, pp. 42-54, 2019.
We have witnessed promising advances on video captioning in recent years, which is a challenging task since it is hard to capture the semantic correspondences between visual content and language descriptions. Different granularities of language components (e.g. words, phrases and sentences), are corresponding to different granularities of...More
Full Text (Upload PDF)
PPT (Upload PPT)
Best Paper of MMM, 2019