KMMLU: Measuring Massive Multitask Language Understanding in Korean
CoRR(2024)
摘要
We propose KMMLU, a new Korean benchmark with 35,030 expert-level
multiple-choice questions across 45 subjects ranging from humanities to STEM.
Unlike previous Korean benchmarks that are translated from existing English
benchmarks, KMMLU is collected from original Korean exams, capturing linguistic
and cultural aspects of the Korean language. We test 26 publically available
and proprietary LLMs, identifying significant room for improvement. The best
publicly available model achieves 50.54
performance of 62.6
not Korean. Current LLMs tailored to Korean, such as Polyglot-Ko, perform far
worse. Surprisingly, even the most capable proprietary LLMs, e.g., GPT-4 and
HyperCLOVA X, achieve 59.95
further work is needed to improve Korean LLMs, and KMMLU offers the right tool
to track this progress. We make our dataset publicly available on the Hugging
Face Hub and integrate the benchmark into EleutherAI's Language Model
Evaluation Harness.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要