PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology
arxiv(2024)
摘要
The emergence of large multimodal models has unlocked remarkable potential in
AI, particularly in pathology. However, the lack of specialized, high-quality
benchmark impeded their development and precise evaluation. To address this, we
introduce PathMMU, the largest and highest-quality expert-validated pathology
benchmark for Large Multimodal Models (LMMs). It comprises 33,428 multimodal
multi-choice questions and 24,067 images from various sources, each accompanied
by an explanation for the correct answer. The construction of PathMMU harnesses
GPT-4V's advanced capabilities, utilizing over 30,000 image-caption pairs to
enrich captions and generate corresponding Q As in a cascading process.
Significantly, to maximize PathMMU's authority, we invite seven pathologists to
scrutinize each question under strict standards in PathMMU's validation and
test sets, while simultaneously setting an expert-level performance benchmark
for PathMMU. We conduct extensive evaluations, including zero-shot assessments
of 14 open-sourced and 4 closed-sourced LMMs and their robustness to image
corruption. We also fine-tune representative LMMs to assess their adaptability
to PathMMU. The empirical findings indicate that advanced LMMs struggle with
the challenging PathMMU benchmark, with the top-performing LMM, GPT-4V,
achieving only a 49.8
71.8
smaller open-sourced LMMs can outperform GPT-4V but still fall short of the
expertise shown by pathologists. We hope that the PathMMU will offer valuable
insights and foster the development of more specialized, next-generation LMMs
for pathology.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要