CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark
CoRR(2024)
摘要
As the capabilities of large multimodal models (LMMs) continue to advance,
evaluating the performance of LMMs emerges as an increasing need. Additionally,
there is an even larger gap in evaluating the advanced knowledge and reasoning
abilities of LMMs in non-English contexts such as Chinese. We introduce CMMMU,
a new Chinese Massive Multi-discipline Multimodal Understanding benchmark
designed to evaluate LMMs on tasks demanding college-level subject knowledge
and deliberate reasoning in a Chinese context. CMMMU is inspired by and
strictly follows the annotation and analysis pattern of MMMU.
CMMMU includes 12k manually collected multimodal questions from college
exams, quizzes, and textbooks, covering six core disciplines: Art Design,
Business, Science, Health Medicine, Humanities Social Science, and Tech
Engineering, like its companion, MMMU. These questions span 30 subjects and
comprise 39 highly heterogeneous image types, such as charts, diagrams, maps,
tables, music sheets, and chemical structures.
CMMMU focuses on complex perception and reasoning with domain-specific
knowledge in the Chinese context. We evaluate 11 open-source LLMs and one
proprietary GPT-4V(ision). Even GPT-4V only achieves accuracies of 42
indicating a large space for improvement. CMMMU will boost the community to
build the next-generation LMMs towards expert artificial intelligence and
promote the democratization of LMMs by providing diverse language contexts.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要