Counting-based visual question answering with serial cascaded attention deep learning

Pattern Recognit.(2023)

引用 0|浏览1
暂无评分
摘要
The counting-based questions play a major part in Visual Question Answering (VQA), the most challenging factor is counting the different objects present in the images. Recently more attention is paid to design a model of count aided VQA. Based on the questions, the VQA system responds with appropriate answers. Yet, the complex questions are necessitating in the system with answers. The earlier models are still facing the challenging problems of counting the various objects within the images as the models become futile to select the features and lack fine-grained representation. In order to sustain the image representation, this paper proposes a new model for VQA using the heuristic approach of serial cascaded deep learning methods. Initially, the standard data regarding images and text data are gathered and fed to the pre-processing process. Consequently, the feature extraction is done on both the image and the text data. Here, the deep features from images are taken using Visual Geometry Group 16 (VGG16) and the text features are extracted using Text Convolutional Neural Network (TCNN). Then, the optimal weighted fused features are obtained, where the weights used for getting the necessary features are tuned via the Improved Tuna Swarm Optimization (ITSO) algorithm. Finally, the counting answers are retrieved based on the given queries, which is carried out via Serial Cascaded Recurrent Neural Network with Attention Mechanism-based Long Short-Term Memory (SCRAM-LSTM). The performance is examined with divergent metrics compared with conventional models. Hence, the findings reveal that it offers superior performance in estimating the appropriate answers. Therefore, the proposed work is widely used for such potential applications as helping blind or visually impaired people to get information, integrating with image retrieval systems, and also for search engines. Especially, it is utilized for the vision and language systems.
更多
查看译文
关键词
Counting-based visual question answering,Visual geometry group16,Text convolutional neural network,Optimal weighted fused features,Improved tuna swarm optimization,Serial cascaded recurrent neural network with attention mechanism-based long short-term memory
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要