Efficient Video Captioning with Frame Similarity-Based Filtering.

Elyas Rashno, Farhana H. Zulkernine

DEXA (2)(2023)

引用 0|浏览3
暂无评分
摘要
Video captioning combines computer vision and Natural Language Processing (NLP) to perform the challenging task of scene understanding. The rapid advancements in artificial intelligence have led to a growing interest in video captioning, which involves generating natural language descriptions based on the visual content of videos. In this paper, we present a novel approach to video caption generation. The proposed method first extracts frames from the video and reduces the number of frames based on their similarity. The remaining frames are then processed by a Convolution Neural Network (CNN) to extract a feature vector, which is then fed into a Long Short-Term Memory (LSTM) network to generate the captions. The results are compared with the state-of-the-art models which demonstrate that the proposed approach outperforms the existing methods on MSVD, M-VAD, and MPII-MD datasets.
更多
查看译文
关键词
efficient video captioning,filtering,similarity-based
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要