Understanding Self-Supervised Pretraining with Part-Aware Representation Learning
CoRR(2023)
Abstract
In this paper, we are interested in understanding self-supervised pretraining
through studying the capability that self-supervised representation pretraining
methods learn part-aware representations. The study is mainly motivated by that
random views, used in contrastive learning, and random masked (visible)
patches, used in masked image modeling, are often about object parts.
We explain that contrastive learning is a part-to-whole task: the projection
layer hallucinates the whole object representation from the object part
representation learned from the encoder, and that masked image modeling is a
part-to-part task: the masked patches of the object are hallucinated from the
visible patches. The explanation suggests that the self-supervised pretrained
encoder is required to understand the object part. We empirically compare the
off-the-shelf encoders pretrained with several representative methods on
object-level recognition and part-level recognition. The results show that the
fully-supervised model outperforms self-supervised models for object-level
recognition, and most self-supervised contrastive learning and masked image
modeling methods outperform the fully-supervised method for part-level
recognition. It is observed that the combination of contrastive learning and
masked image modeling further improves the performance.
MoreTranslated text
Key words
Part-aware representation,Self-supervised learning,Masked image modeling,Contrastive learning
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined