Art or Artifice? Large Language Models and the False Promise of Creativity
CoRR(2023)
摘要
Researchers have argued that large language models (LLMs) exhibit
high-quality writing capabilities from blogs to stories. However, evaluating
objectively the creativity of a piece of writing is challenging. Inspired by
the Torrance Test of Creative Thinking (TTCT), which measures creativity as a
process, we use the Consensual Assessment Technique [3] and propose the
Torrance Test of Creative Writing (TTCW) to evaluate creativity as a product.
TTCW consists of 14 binary tests organized into the original dimensions of
Fluency, Flexibility, Originality, and Elaboration. We recruit 10 creative
writers and implement a human assessment of 48 stories written either by
professional authors or LLMs using TTCW. Our analysis shows that LLM-generated
stories pass 3-10X less TTCW tests than stories written by professionals. In
addition, we explore the use of LLMs as assessors to automate the TTCW
evaluation, revealing that none of the LLMs positively correlate with the
expert assessments.
更多查看译文
关键词
creativity,large language,large language models,art
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要