Mind'S Eye: A Recurrent Visual Representation For Image Caption Generation

2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)(2015)

引用 629|浏览142
暂无评分
摘要
In this paper we explore the bi-directional mapping between images and their sentence-based descriptions. Critical to our approach is a recurrent neural network that attempts to dynamically build a visual representation of the scene as a caption is being generated or read. The representation automatically learns to remember long-term visual concepts. Our model is capable of both generating novel captions given an image, and reconstructing visual features given an image description. We evaluate our approach on several tasks. These include sentence generation, sentence retrieval and image retrieval. State-of-the-art results are shown for the task of generating novel image descriptions. When compared to human generated captions, our automatically generated captions are equal to or preferred by humans 21.0% of the time. Results are better than or comparable to state-of-the-art results on the image and sentence retrieval tasks for methods using similar visual features.
更多
查看译文
关键词
mind eye,recurrent visual representation,image caption generation,bidirectional mapping,sentence-based descriptions,recurrent neural network,long-term visual concepts,image description,sentence generation,sentence retrieval,image retrieval,image descriptions,human generated captions,automatically generated captions,sentence retrieval tasks,image retrieval tasks,visual features
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要