An Empirical Study Into What Matters for Calibrating Vision-Language Models
CoRR(2024)
摘要
Vision–Language Models (VLMs) have emerged as the dominant approach for
zero-shot recognition, adept at handling diverse scenarios and significant
distribution changes. However, their deployment in risk-sensitive areas
requires a deeper understanding of their uncertainty estimation capabilities, a
relatively uncharted area. In this study, we explore the calibration properties
of VLMs across different architectures, datasets, and training strategies. In
particular, we analyze the uncertainty estimation performance of VLMs when
calibrated in one domain, label set or hierarchy level, and tested in a
different one. Our findings reveal that while VLMs are not inherently
calibrated for uncertainty, temperature scaling significantly and consistently
improves calibration, even across shifts in distribution and changes in label
set. Moreover, VLMs can be calibrated with a very small set of examples.
Through detailed experimentation, we highlight the potential applications and
importance of our insights, aiming for more reliable and effective use of VLMs
in critical, real-world scenarios.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要