BenLLMEval: A Comprehensive Evaluation into the Potentials and Pitfalls of Large Language Models on Bengali NLP
arxiv(2023)
摘要
Large Language Models (LLMs) have emerged as one of the most important
breakthroughs in NLP for their impressive skills in language generation and
other language-specific tasks. Though LLMs have been evaluated in various
tasks, mostly in English, they have not yet undergone thorough evaluation in
under-resourced languages such as Bengali (Bangla). To this end, this paper
introduces BenLLM-Eval, which consists of a comprehensive evaluation of LLMs to
benchmark their performance in the Bengali language that has modest resources.
In this regard, we select various important and diverse Bengali NLP tasks, such
as text summarization, question answering, paraphrasing, natural language
inference, transliteration, text classification, and sentiment analysis for
zero-shot evaluation of popular LLMs, namely, GPT-3.5, LLaMA-2-13b-chat, and
Claude-2. Our experimental results demonstrate that while in some Bengali NLP
tasks, zero-shot LLMs could achieve performance on par, or even better than
current SOTA fine-tuned models; in most tasks, their performance is quite poor
(with the performance of open-source LLMs like LLaMA-2-13b-chat being
significantly bad) in comparison to the current SOTA results. Therefore, it
calls for further efforts to develop a better understanding of LLMs in
modest-resourced languages like Bengali.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要