Interpretable User Satisfaction Estimation for Conversational Systems with Large Language Models
arxiv(2024)
摘要
Accurate and interpretable user satisfaction estimation (USE) is critical for
understanding, evaluating, and continuously improving conversational systems.
Users express their satisfaction or dissatisfaction with diverse conversational
patterns in both general-purpose (ChatGPT and Bing Copilot) and task-oriented
(customer service chatbot) conversational systems. Existing approaches based on
featurized ML models or text embeddings fall short in extracting generalizable
patterns and are hard to interpret. In this work, we show that LLMs can extract
interpretable signals of user satisfaction from their natural language
utterances more effectively than embedding-based approaches. Moreover, an LLM
can be tailored for USE via an iterative prompting framework using supervision
from labeled examples. The resulting method, Supervised Prompting for User
satisfaction Rubrics (SPUR), not only has higher accuracy but is more
interpretable as it scores user satisfaction via learned rubrics with a
detailed breakdown.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要