Bornon: Bengali Image Captioning with Transformer-Based Deep Learning Approach

SN Computer Science(2021)

引用 3|浏览2
暂无评分
摘要
Image captioning using encoder–decoder-based approach where CNN is used as the Encoder and sequence generator like RNN as Decoder has proven to be very effective. However, this method has a drawback, that is, sequence needs to be processed in order. To overcome this drawback, some researchers have utilized the transformer model to generate captions from images using English datasets. However, none of them generated captions in Bengali using the transformer model. As a result, we utilized three different Bengali datasets to generate Bengali captions from images using the transformer model. Additionally, we compared the performance of the transformer-based model with a visual attention-based encoder–decoder approach. Finally, we compared the result of the transformer-based model with other models that employed different Bengali image captioning datasets.
更多
查看译文
关键词
Bengali image captioning, Transformer model, Visual attention, Bornon dataset
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要