MERA: A Comprehensive LLM Evaluation in Russian
CoRR(2024)
摘要
Over the past few years, one of the most notable advancements in AI research
has been in foundation models (FMs), headlined by the rise of language models
(LMs). As the models' size increases, LMs demonstrate enhancements in
measurable aspects and the development of new qualitative features. However,
despite researchers' attention and the rapid growth in LM application, the
capabilities, limitations, and associated risks still need to be better
understood. To address these issues, we introduce an open Multimodal Evaluation
of Russian-language Architectures (MERA), a new instruction benchmark for
evaluating foundation models oriented towards the Russian language. The
benchmark encompasses 21 evaluation tasks for generative models in 11 skill
domains and is designed as a black-box test to ensure the exclusion of data
leakage. The paper introduces a methodology to evaluate FMs and LMs in zero-
and few-shot fixed instruction settings that can be extended to other
modalities. We propose an evaluation methodology, an open-source code base for
the MERA assessment, and a leaderboard with a submission system. We evaluate
open LMs as baselines and find that they are still far behind the human level.
We publicly release MERA to guide forthcoming research, anticipate
groundbreaking model features, standardize the evaluation procedure, and
address potential societal drawbacks.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要