OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM
CoRR(2024)
摘要
Large Vision-Language Models (LVLMs) have demonstrated remarkable
capabilities in various multimodal tasks. However, their potential in the
medical domain remains largely unexplored. A significant challenge arises from
the scarcity of diverse medical images spanning various modalities and
anatomical regions, which is essential in real-world medical applications. To
solve this problem, in this paper, we introduce OmniMedVQA, a novel
comprehensive medical Visual Question Answering (VQA) benchmark. This benchmark
is collected from 75 different medical datasets, including 12 different
modalities and covering more than 20 distinct anatomical regions. Importantly,
all images in this benchmark are sourced from authentic medical scenarios,
ensuring alignment with the requirements of the medical field and suitability
for evaluating LVLMs. Through our extensive experiments, we have found that
existing LVLMs struggle to address these medical VQA problems effectively.
Moreover, what surprises us is that medical-specialized LVLMs even exhibit
inferior performance to those general-domain models, calling for a more
versatile and robust LVLM in the biomedical field. The evaluation results not
only reveal the current limitations of LVLM in understanding real medical
images but also highlight our dataset's significance. Our dataset will be made
publicly available.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要