Double-Ended Prediction Of The Naturalness Ratings Of The Blizzard Challenge 2008-2013

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5(2015)

引用 24|浏览13
暂无评分
摘要
In this paper we describe a double-ended (i.e. reference-based or intrusive) approach to objective quality estimation of synthetic speech that uses a linear regression model whose parameters can easily be interpreted. The model was trained and evaluated on English data from the 2008 to 2013 Blizzard Challenges (BC) [1], which is the largest publically available resource of listener-evaluated synthetic speech. To our knowledge, this is the first attempt to train and evaluate a speech quality predictor on the whole data set. Predicting the naturalness of the different participating systems in the BC is not an easy task because some of the systems are quite close in quality. Our best results correspond to a Pearson correlation coefficient of 0.60 and 0.84 for sentences and systems, respectively, using a leave-one-system out evaluation, which by far outperformed the ITU-T standard PESQ [2] for double-ended speech quality evaluation on this data.
更多
查看译文
关键词
speech-quality prediction, speech synthesis, intrusive quality assessment, Blizzard Challenge
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要