In-Domain Self-Supervised Learning Improves Remote Sensing Image Scene Classification

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS(2024)

引用 0|浏览1
暂无评分
摘要
We investigate the utility of in-domain self-supervised pretraining of vision models in the analysis of remote sensing imagery. Self-supervised learning (SSL) has emerged as a promising approach for remote sensing image classification due to its ability to exploit large amounts of unlabeled data. Unlike traditional supervised learning, SSL aims to learn representations of data without the need for explicit labels. This is achieved by formulating auxiliary tasks that can be used for pretraining models before fine-tuning them on a given downstream task. A common approach in practice to SSL pretraining is utilizing standard pretraining datasets, such as ImageNet. While relevant, such a general approach can have a suboptimal influence on the downstream performance of models, especially on tasks from challenging domains such as remote sensing. In this letter, we analyze the effectiveness of SSL pretraining by employing the image bidirectional encoder representations from transformers (BERT) pretraining with online tokenizer (iBOT) framework coupled with Vision transformers (ViTs) trained on million aerial image dataset (Million-AID), a large and unlabeled remote sensing dataset. We present a comprehensive study of different self-supervised pretraining strategies and evaluate their effect across 14 downstream datasets with diverse properties. Our results demonstrate that leveraging large in-domain datasets for self-supervised pretraining consistently leads to improved predictive downstream performance, compared to the standard approaches found in practice.
更多
查看译文
关键词
Task analysis,Remote sensing,Data models,Transformers,Training,Self-supervised learning,Scene classification,Deep learning,land use and land cover classification,remote sensing,self-supervised learning (SSL)
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要