Weakly-supervised image captioning based on rich contextual information

Hai-Tao Zheng,Zhe Wang,Ningning Ma,Jinyuan Chen,Xi Xiao,Arun Kumar Sangaiah

Multimedia Tools and Applications（2017）

引用 9|浏览86

暂无评分

摘要

Automatically generation of an image description is a challenging task which attracts broad attention in artificial intelligence. Inspired by methods of computer vision and natural language processing, different approaches have been proposed to solve the problem. However, captions generated by the existing approaches have been lack of enough contextual information to describe the corresponding images completely. The labeled captions in the training set only basically describe images and lack of enough contextual annotations. In this paper, we propose a Weakly-supervised Image Captioning Approach (WICA) to generate captions containing rich contextual information, without complete annotations for the contextual information in datasets. We utilize encoder-decoder neural networks to extract basic captioning features and leverage object detection networks to identify contextual features. Then, we encode the two levels of features by a phrase-based language model in order to generate captions with rich contextual information. The comprehensive experimental results reveal that proposed model outperforms the existing baselines in terms of on the richness and reasonability of contextual information for image captioning.

查看译文

关键词

Image captioning,Weakly-supervised learning,Rich contextual information,Encoder-decoder neural networks,Object detection,Phrase-based language model

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要