Image captioning by incorporating affective concepts learned from both visual and textual components.

Jufeng Yang,Yan Sun,Jie Liang,Bo Ren,Shang-Hong Lai

Neurocomputing（2019）

引用 27|浏览115

暂无评分

摘要

Automatically generating a natural sentence describing the content of an image has been extensively researched in artificial intelligence recently, and it bridges the gap between computer vision and natural language processing communities. Most of existing captioning frameworks rely heavily on the visual content, while rarely being aware of the sentimental information. In this paper, we introduce the affective concepts to enhance the emotion expressibility of text descriptions. We achieve this goal by composing appropriate emotional concepts to sentences, which is calculated from large-scale visual and textual repositories by learning both content and linguistic modules. We extract visual and textual representations respectively, followed by combining the latent codes of the two components into a low-dimensional subspace. After that, we decode the combined latent representations and finally generate the affective image captions. We evaluate our method on the SentiCap dataset, which was established with sentimental adjective noun pairs, and evaluate the emotional descriptions with several qualitative and human inception metrics. The experimental results demonstrate the capability of our method for analyzing the latent emotion of an image and providing the affective description which caters to human cognition.

查看译文

关键词

Image captioning,Affective concepts,Emotion recognition

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要