Learning by Reconstruction Produces Uninformative Features For Perception
CoRR(2024)
摘要
Input space reconstruction is an attractive representation learning paradigm.
Despite interpretability of the reconstruction and generation, we identify a
misalignment between learning by reconstruction, and learning for perception.
We show that the former allocates a model's capacity towards a subspace of the
data explaining the observed variance–a subspace with uninformative features
for the latter. For example, the supervised TinyImagenet task with images
projected onto the top subspace explaining 90% of the pixel variance can be
solved with 45% test accuracy. Using the bottom subspace instead, accounting
for only 20% of the pixel variance, reaches 55% test accuracy. The features
for perception being learned last explains the need for long training time,
e.g., with Masked Autoencoders. Learning by denoising is a popular strategy to
alleviate that misalignment. We prove that while some noise strategies such as
masking are indeed beneficial, others such as additive Gaussian noise are not.
Yet, even in the case of masking, we find that the benefits vary as a function
of the mask's shape, ratio, and the considered dataset. While tuning the noise
strategy without knowledge of the perception task seems challenging, we provide
first clues on how to detect if a noise strategy is never beneficial regardless
of the perception task.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要