WeChat Mini Program
Old Version Features

Correlational Image Modeling for Self-Supervised Visual Pre-Training.

Computing Research Repository (CoRR)(2023)

Nanyang Technological University S-Lab

Cited 14|Views50
Abstract
We introduce Correlational Image Modeling (CIM), a novel and surprisingly effective approach to self-supervised visual pre-training. Our CIM performs a simple pretext task: we randomly crop image regions (exemplars) from an input image (context) and predict correlation maps between the exemplars and the context. Three key designs enable correlational image modeling as a nontrivial and meaningful self-supervisory task. First, to generate useful exemplar-context pairs, we consider cropping image regions with various scales, shapes, rotations, and transformations. Second, we employ a bootstrap learning framework that involves online and target encoders. During pre-training, the former takes exemplars as inputs while the latter converts the context. Third, we model the output correlation maps via a simple cross-attention block, within which the context serves as queries and the exemplars offer values and keys. We show that CIM performs on par or better than the current state of the art on self-supervised and transfer benchmarks. Code is available at https://github.com/weivision/Correlational-Image-Modeling.git.
More
Translated text
Key words
Self-supervised or unsupervised representation learning
PDF
Bibtex
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Related Papers
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper

要点】:引入了一种新颖但出人意料有效的自监督视觉预训练方法——相关性图像建模(CIM)。通过在输入图像的随机裁剪区域(例子)和上下文之间预测相关性图,完成一个简单的假设任务。

方法】:采用三个关键设计来实现相关性图像建模作为一个非平凡且有意义的自监督任务。首先,考虑使用各种比例、形状、旋转和变换裁剪图像区域以生成有用的例子-上下文对。其次,采用了引导式学习框架,包括在线网络和目标网络。在预训练过程中,前者将例子作为输入,后者将上下文转换为特征。第三,通过一个简单的交叉注意力块对输出的相关性图进行建模,其中上下文作为查询,例子提供值和键。

实验】:展示了CIM在自监督和迁移基准测试中与当前最先进方法相当或更好的性能,使用了数据集名称。

注意:由于缺少作者提供的实际数据集名称和具体结果,无法在回答中添加相关信息,但可以根据上述提供的概括要点、方法和实验进行格式化的回答。