NewsBench: Systematic Evaluation of LLMs for Writing Proficiency and Safety Adherence in Chinese Journalistic Editorial Applications
arxiv(2024)
摘要
This study presents NewsBench, a novel benchmark framework developed to
evaluate the capability of Large Language Models (LLMs) in Chinese Journalistic
Writing Proficiency (JWP) and their Safety Adherence (SA), addressing the gap
between journalistic ethics and the risks associated with AI utilization.
Comprising 1,267 tasks across 5 editorial applications, 7 aspects (including
safety and journalistic writing with 4 detailed facets), and spanning 24 news
topics domains, NewsBench employs two GPT-4 based automatic evaluation
protocols validated by human assessment. Our comprehensive analysis of 11 LLMs
highlighted GPT-4 and ERNIE Bot as top performers, yet revealed a relative
deficiency in journalistic ethic adherence during creative writing tasks. These
findings underscore the need for enhanced ethical guidance in AI-generated
journalistic content, marking a step forward in aligning AI capabilities with
journalistic standards and safety considerations.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要