Semantic Compositional Networks for Visual Captioning

CVPR(2017)

引用 501|浏览117
暂无评分
摘要
A Semantic Compositional Network (SCN) is developed for image captioning, in which semantic concepts (i.e., tags) are detected from the image, and the probability of each tag is used to compose the parameters in a long short-term memory (LSTM) network. The SCN extends each weight matrix of the LSTM to an ensemble of tag-dependent weight matrices. The degree to which each member of the ensemble is used to generate an image caption is tied to the image-dependent probability of the corresponding tag. In addition to captioning images, we also extend the SCN to generate captions for video clips. We qualitatively analyze semantic composition in SCNs, and quantitatively evaluate the algorithm on three benchmark datasets: COCO, Flickr30k, and Youtube2Text. Experimental results show that the proposed method significantly outperforms prior state-of-the-art approaches, across multiple evaluation metrics.
更多
查看译文
关键词
SCN,image captioning,semantic concepts,LSTM,weight matrix,tag-dependent weight matrices,image caption,image-dependent probability,corresponding tag,semantic composition,visual captioning,long short-term memory network,semantic compositional network,semantic compositional networks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要