Weakly-supervised image captioning based on rich contextual information

Multimedia Tools and Applications(2017)

引用 9|浏览86
暂无评分
摘要
Automatically generation of an image description is a challenging task which attracts broad attention in artificial intelligence. Inspired by methods of computer vision and natural language processing, different approaches have been proposed to solve the problem. However, captions generated by the existing approaches have been lack of enough contextual information to describe the corresponding images completely. The labeled captions in the training set only basically describe images and lack of enough contextual annotations. In this paper, we propose a Weakly-supervised Image Captioning Approach (WICA) to generate captions containing rich contextual information, without complete annotations for the contextual information in datasets. We utilize encoder-decoder neural networks to extract basic captioning features and leverage object detection networks to identify contextual features. Then, we encode the two levels of features by a phrase-based language model in order to generate captions with rich contextual information. The comprehensive experimental results reveal that proposed model outperforms the existing baselines in terms of on the richness and reasonability of contextual information for image captioning.
更多
查看译文
关键词
Image captioning,Weakly-supervised learning,Rich contextual information,Encoder-decoder neural networks,Object detection,Phrase-based language model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要