KoDialogBench: Evaluating Conversational Understanding of Language Models with Korean Dialogue Benchmark
CoRR(2024)
摘要
As language models are often deployed as chatbot assistants, it becomes a
virtue for models to engage in conversations in a user's first language. While
these models are trained on a wide range of languages, a comprehensive
evaluation of their proficiency in low-resource languages such as Korean has
been lacking. In this work, we introduce KoDialogBench, a benchmark designed to
assess language models' conversational capabilities in Korean. To this end, we
collect native Korean dialogues on daily topics from public sources, or
translate dialogues from other languages. We then structure these conversations
into diverse test datasets, spanning from dialogue comprehension to response
selection tasks. Leveraging the proposed benchmark, we conduct extensive
evaluations and analyses of various language models to measure a foundational
understanding of Korean dialogues. Experimental results indicate that there
exists significant room for improvement in models' conversation skills.
Furthermore, our in-depth comparisons across different language models
highlight the effectiveness of recent training techniques in enhancing
conversational proficiency. We anticipate that KoDialogBench will promote the
progress towards conversation-aware Korean language models.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要