A Computational Framework for Behavioral Assessment of LLM Therapists
CoRR(2024)
摘要
The emergence of ChatGPT and other large language models (LLMs) has greatly
increased interest in utilizing LLMs as therapists to support individuals
struggling with mental health challenges. However, due to the lack of
systematic studies, our understanding of how LLM therapists behave, i.e., ways
in which they respond to clients, is significantly limited. Understanding their
behavior across a wide range of clients and situations is crucial to accurately
assess their capabilities and limitations in the high-risk setting of mental
health, where undesirable behaviors can lead to severe consequences. In this
paper, we propose BOLT, a novel computational framework to study the
conversational behavior of LLMs when employed as therapists. We develop an
in-context learning method to quantitatively measure the behavior of LLMs based
on 13 different psychotherapy techniques including reflections, questions,
solutions, normalizing, and psychoeducation. Subsequently, we compare the
behavior of LLM therapists against that of high- and low-quality human therapy,
and study how their behavior can be modulated to better reflect behaviors
observed in high-quality therapy. Our analysis of GPT and Llama-variants
reveals that these LLMs often resemble behaviors more commonly exhibited in
low-quality therapy rather than high-quality therapy, such as offering a higher
degree of problem-solving advice when clients share emotions, which is against
typical recommendations. At the same time, unlike low-quality therapy, LLMs
reflect significantly more upon clients' needs and strengths. Our analysis
framework suggests that despite the ability of LLMs to generate anecdotal
examples that appear similar to human therapists, LLM therapists are currently
not fully consistent with high-quality care, and thus require additional
research to ensure quality care.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要