A Closer Look at Model Adaptation using Feature Distortion and Simplicity Bias

ICLR 2023(2023)

引用 7|浏览40
Advances in the expressivity of large-scale pretrained models have increased interest in the design of adaptation protocols which enable safe and effective transfer learning. Going beyond conventional linear probing (LP) and fine tuning (FT) strategies, protocols that can effectively control feature distortion, i.e., the failure to update features orthogonal to the in-distribution, during FT have been found to achieve improved out-of-distribution generalization. A popular example is the recent LP+FT protocol which first learns a linear probe and then uses that initialization during FT. However, in this paper, we find that when adaptation protocols are also evaluated on a variety of safety objectives (e.g., calibration, robustness etc.), that a complementary perspective to feature distortion is required explain protocol behavior. To this end, we study the susceptibility of protocols to simplicity bias (SB), i.e. the well-known propensity of neural networks to rely upon simple features, as SB has recently been shown to underlie several problems in robust generalization. Using a synthetic dominoes dataset obtained by pairing (complex) CIFAR10 with (simple) MNIST samples, we demonstrate that the susceptibility of existing protocols to SB. Given the strong effectiveness of LP+FT, we propose incorporating hardness-promoting perturbations during LP to obtain initializations for FT that further decrease SB. We verify the effectiveness of these modified LP+FT protocols by decreasing SB on the dominoes dataset, and jointly improving OOD generalization and safety on standard adaptation benchmarks.
Transfer Learning,Robustness,Adaptation,Data Augmentation
AI 理解论文
Chat Paper