A CNN-LSTM based approach for image captioning

Esra Balık, Mehmet Kaya,Buket Kaya

7th IET Smart Cities Symposium (SCS 2023)(2023)

引用 0|浏览0
暂无评分
摘要
In this age of information and visual data, it has become necessary in many fields to draw meaningful conclusions from these visual data and express them in a textual context. A visual data makes it more understandable and valuable with a textual explanation. It will also greatly accelerate the scanning of large amounts of data in many application areas. In basic fields such as education or medicine, visual-text pairs are of great importance to use as materials. In addition, many autonomous vehicles can be designed that enable disabled individuals to use visual and textual information together. Such studies have gained a wider application area, especially with the advancement of deep learning technology. In this study, a study was carried out on the Flickr8k dataset using Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM) technologies to title visual data. This model provides an integrated structure for understanding visual data and producing textual descriptions. The accuracy of the caption value created with the BLEU-1 metric was evaluated. In addition, other studies carried out together with this study were discussed and information was given about the performance of these methods.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要