Assessing user simulation for dialog systems using human judges and automatic evaluation measures

NATURAL LANGUAGE ENGINEERING(2011)

引用 5|浏览0
暂无评分
摘要
While different user simulations are built to assist dialog system development, there is an increasing need to quickly assess the quality of the user simulations reliably. Previous studies have proposed several automatic evaluation measures for this purpose. However, the validity of these evaluation measures has not been fully proven. We present an assessment study in which human judgments are collected on user simulation qualities as the gold standard to validate automatic evaluation measures. We show that a ranking model can be built using the automatic measures to predict the rankings of the simulations in the same order as the human judgments. We further show that the ranking model can be improved by using a simple feature that utilizes time-series analysis.
更多
查看译文
关键词
evaluation measure,human judge,automatic measure,dialog system development,user simulation,assessment study,user simulation quality,different user simulation,assessing user simulation,ranking model,human judgment,automatic evaluation measure
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要