Leveraging facial expressions as emotional context in image captioning

Riju Das, Nan Wu,Soumyabrata Dev

Multimedia Tools and Applications(2024)

引用 0|浏览2
暂无评分
摘要
Image captioning has emerged as a prominent approach for generating verbal descriptions of images that humans can read and understand. Numerous techniques and models in this domain have predominantly focused on analyzing the factual elements present within an image, employing convolutional neural networks (CNN) and long short-term memory (LSTM) networks to generate captions. However, an inherent limitation of these existing approaches is their failure to consider the emotional aspects exhibited by the main subject within an image, thereby potentially leading to inaccuracies in reflecting the conveyed emotional content. Acknowledging this limitation, this paper endeavors to construct an improved model dedicated to extracting human emotions from images and seamlessly embedding emotional attributes into the accompanying captions. In our research, we employ the widely accessible benchmarking image captioning dataset, Flickr8k. Our ultimate objective is to establish a more appropriate and impactful model for images containing human faces that provide more accurate and impacting captions.
更多
查看译文
关键词
Image captioning,Facial cues,Facial emotion recognition,Facial expression
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要