Improving Black-box Robustness with In-Context Rewriting
CoRR(2024)
摘要
Machine learning models often excel on in-distribution (ID) data but struggle
with unseen out-of-distribution (OOD) inputs. Most techniques for improving OOD
robustness are not applicable to settings where the model is effectively a
black box, such as when the weights are frozen, retraining is costly, or the
model is leveraged via an API. Test-time augmentation (TTA) is a simple
post-hoc technique for improving robustness that sidesteps black-box
constraints by aggregating predictions across multiple augmentations of the
test input. TTA has seen limited use in NLP due to the challenge of generating
effective natural language augmentations. In this work, we propose LLM-TTA,
which uses LLM-generated augmentations as TTA's augmentation function. LLM-TTA
outperforms conventional augmentation functions across sentiment, toxicity, and
news classification tasks for BERT and T5 models, with BERT's OOD robustness
improving by an average of 4.30 percentage points without regressing average ID
performance. We explore selectively augmenting inputs based on prediction
entropy to reduce the rate of expensive LLM augmentations, allowing us to
maintain performance gains while reducing the average number of generated
augmentations by 57.76
does not require OOD labels, and is effective across low and high-resource
settings. We share our data, models, and code for reproducibility.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要