Search-oriented Micro-video Captioning

International Multimedia Conference(2022)

引用 30|浏览65
暂无评分
摘要
ABSTRACTPioneer efforts have been dedicated to the content-oriented video captioning that generates relevant sentences to describe the visual contents of a given video from the producer perspective. By contrast, this work targets at the search-oriented one that summarizes the given video via generating query-like sentences from the consumer angle. Beyond relevance, diversity is vital in characterizing consumers' seeking intention from different aspects. Towards this end, we devise a large-scale multimodal pre-training network regularized by five tasks to strengthen the downstream video representation, which is well-trained over our collected 11M micro-videos. Thereafter, we present a flow-based diverse captioning model to generate different captions from consumers' search demand. This model is optimized via a reconstruction loss and a KL divergence between the prior and the posterior. We justify our model over our constructed golden dataset comprising 690k pairs and experimental results demonstrate its superiority.
更多
查看译文
关键词
search-oriented,micro-video
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要