GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks.
CoRR(2023)
摘要
Automatically evaluating vision-language tasks is challenging, especially
when it comes to reflecting human judgments due to limitations in accounting
for fine-grained details. Although GPT-4V has shown promising results in
various multi-modal tasks, leveraging GPT-4V as a generalist evaluator for
these tasks has not yet been systematically explored. We comprehensively
validate GPT-4V's capabilities for evaluation purposes, addressing tasks
ranging from foundational image-to-text and text-to-image synthesis to
high-level image-to-image translations and multi-images to text alignment. We
employ two evaluation methods, single-answer grading and pairwise comparison,
using GPT-4V. Notably, GPT-4V shows promising agreement with humans across
various tasks and evaluation methods, demonstrating immense potential for
multi-modal LLMs as evaluators. Despite limitations like restricted visual
clarity grading and real-world complex reasoning, its ability to provide
human-aligned scores enriched with detailed explanations is promising for
universal automatic evaluator.
更多查看译文
关键词
generalist evaluator,vision-language
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要