Open-Ended Visual Question Answering Model For Remote Sensing Images

IEEE International Geoscience and Remote Sensing Symposium (IGARSS)(2022)

引用 0|浏览2
暂无评分
摘要
In this paper, we present an open-ended visual question answering (VQA) model for remote sensing images, where the answers can be given in the form of short sentences, unlike closed-ended VQA. This model uses a vision and natural language transformers for embedding the image and its related question. The feature representations obtained at the output are concatenated and fed to a light transformer decoder for generating the answer in an autoregressive way. The complete architecture is trained in an end-to-end manner via the backpropagation algorithm. In the experiments, we evaluate the model on a manually labeled open-ended VQA dataset termed TextRS composed of 6245 image-question pairs.
更多
查看译文
关键词
Remote sensing, visual question answering, language, vision transformers
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要