Understanding Optimal Feature Transfer via a Fine-Grained Bias-Variance Analysis
CoRR(2024)
Abstract
In the transfer learning paradigm models learn useful representations (or
features) during a data-rich pretraining stage, and then use the pretrained
representation to improve model performance on data-scarce downstream tasks. In
this work, we explore transfer learning with the goal of optimizing downstream
performance. We introduce a simple linear model that takes as input an
arbitrary pretrained feature transform. We derive exact asymptotics of the
downstream risk and its fine-grained bias-variance decomposition. Our finding
suggests that using the ground-truth featurization can result in
"double-divergence" of the asymptotic risk, indicating that it is not
necessarily optimal for downstream performance. We then identify the optimal
pretrained representation by minimizing the asymptotic downstream risk averaged
over an ensemble of downstream tasks. Our analysis reveals the relative
importance of learning the task-relevant features and structures in the data
covariates and characterizes how each contributes to controlling the downstream
risk from a bias-variance perspective. Moreover, we uncover a phase transition
phenomenon where the optimal pretrained representation transitions from hard to
soft selection of relevant features and discuss its connection to principal
component regression.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined