How to Evaluate Single-Round Dialogues Like Humans: An Information-oriented Metric

Shenghua ZHONG,Peiqi Liu,Zhong Ming,Yan Liu

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING（2020）

引用 4|浏览48

暂无评分

摘要

Developing a dialogue response generation system is one of important topics in natural language processing, but many obstacles are yet to be overcome before autogenerated dialogues with a human-like quality can become possible. A good evaluation method will help narrow the gap between machines and humans in dialogue generation. Unfortunately, the existing automatic evaluation methods are biased and correlate very poorly with human judgments of response quality. Such methods are incapable of assessing whether a dialogue response generation system can produce high-quality, knowledge-related and informative dialogues. In response to this challenge, we design an information-oriented framework to simulate human subjective evaluation. Using this framework, we implement a learning-based metric to evaluate the quality of a dialogue. An experimental validation demonstrates our proposed metric's effectiveness in dialogue selection and model evaluation on a Twitter dataset (in English) and a Weibo dataset (in Chinese). In addition, the metric is more relevant than the existing methods of dialogue evaluation to human subjective judgment.

查看译文

关键词

Chatbot,dialogue evaluation,information extraction,attention mechanism

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要