EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain
CoRR(2024)
摘要
Multi-modal large language models (MLLMs) have demonstrated remarkable
success in vision and visual-language tasks within the natural image domain.
Owing to the significant diversities between the natural image and RS image
hinder the development of MLLMs in the remote sensing (RS) domain. Currently,
the unified and powerful MLLM capable of various RS visual tasks is still
under-explored. To fill the gap, a pioneer MLLM called EarthGPT is proposed for
universal RS image comprehension, which integrates various multi-sensor RS
interpretation tasks uniformly. More importantly, a large-scale multi-sensor
multi-modal RS instruction-following dataset named MMRS is carefully
constructed, which comprises 1005.842k image-text pairs based on 34 existing
diverse RS datasets and includes multi-sensor images such as optical, synthetic
aperture radar (SAR), and infrared. The MMRS addresses the issue of MLLMs
lacking RS expert knowledge and stimulates the development of MMLMs in the RS
domain. Extensive experiments demonstrate the EarthGPT's superior performance
in various RS visual interpretation tasks compared with the other specialist
models and MLLMs, which proves the effectiveness of the proposed EarthGPT and
provides a versatile paradigm for open-set reasoning tasks.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要