A Visual Attention-Based Model for Bengali Image Captioning

SN Comput. Sci.(2023)

引用 1|浏览0
暂无评分
摘要
Image caption or description generation is a fundamental task that involves computer vision (CV) and natural language processing (NLP) ideas to recognize an image-context and produces description(s) using a natural language. Bengali is one of the world’s most commonly spoken languages, ranking fifth. For this reason, large research achievements have been recognized to image captioning, i.e. explaining images with grammatically correct and semantically meaningful Bengali sentences. Many established datasets exist for image caption generation in English, but no standard dataset is available for Bengali. This paper proposes a model for generating automatic image captions in the Bengali language. This study uses only two initial available Bengali datasets to train the encoder-decoder neural network model. We have curated the human errors available in the datasets and benchmarked. Experimental results of this proposed model performs better than other baseline models using these datasets. It achieved 0.67 and 0.65 BLEU-1, and 0.26 and 0.24 BLEU-4 respectively. We expect that our research will take attentions to more researchers from regional language understanding and accelerate Bengali vision-language understanding.
更多
查看译文
关键词
visual,image,attention-based
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要