MM-SAP: A Comprehensive Benchmark for Assessing Self-Awareness of Multimodal Large Language Models in Perception
CoRR(2024)
摘要
Multimodal Large Language Models (MLLMs) have shown their remarkable
abilities in visual perception and understanding recently. However, how to
comprehensively evaluate the capabilities of MLLMs remains a challenge. Most of
the existing benchmarks predominantly focus on assessing perception, cognition,
and reasoning, neglecting the abilities of self-awareness, referring to the
model's recognition of its own capability boundary. In our study, we focus on
self-awareness in image perception and introduce the knowledge quadrant for
MLLMs, which clearly defines the knowns and unknowns in perception. Based on
this, we propose a novel benchmark specifically designed to evaluate the
Self-Aware capabilities in Perception for MLLMs(MM-SAP). MM-SAP encompasses
three distinct sub-datasets, each focusing on different aspects of
self-awareness. We evaluated eight well-known MLLMs using MM-SAP, analyzing
their self-awareness and providing detailed insights. Code and data are
available at https://github.com/YHWmz/MM-SAP
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要