CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation
CoRR(2024)
摘要
Recently, the advent of large language models (LLMs) has revolutionized
generative agents. Among them, Role-Playing Conversational Agents (RPCAs)
attract considerable attention due to their ability to emotionally engage
users. However, the absence of a comprehensive benchmark impedes progress in
this field. To bridge this gap, we introduce CharacterEval, a Chinese benchmark
for comprehensive RPCA assessment, complemented by a tailored high-quality
dataset. The dataset comprises 1,785 multi-turn role-playing dialogues,
encompassing 23,020 examples and featuring 77 characters derived from Chinese
novels and scripts. It was carefully constructed, beginning with initial
dialogue extraction via GPT-4, followed by rigorous human-led quality control,
and enhanced with in-depth character profiles sourced from Baidu Baike.
CharacterEval employs a multifaceted evaluation approach, encompassing thirteen
targeted metrics on four dimensions. Comprehensive experiments on CharacterEval
demonstrate that Chinese LLMs exhibit more promising capabilities than GPT-4 in
Chinese role-playing conversation. Source code, data source and reward model
will be publicly accessible at https://github.com/morecry/CharacterEval.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要