Towards a Neuro-Inspired No-Reference Instrumental Quality Measure for Text-to-Speech Systems

2018 Tenth International Conference on Quality of Multimedia Experience (QoMEX)(2018)

引用 0|浏览15
暂无评分
摘要
Subjective evaluation of synthesized speech is not an easy task as various quality dimensions can be affected, including naturalness, prosody, pronunciation, and continuity, to name a few. Evaluations typically rely on naive listeners, thus more closely representing the consumers of commercial products. As such, while the results of these costly and time consuming tests may provide text-to-speech (TTS) system developers with feedback on the perceived quality and acceptability of their devices, it provides little information on what the source of the problems are and what can be done about it. In this paper, we propose the use of neuroimaging to probe the unconscious cognitive processing of naive listeners as they listen to synthesized speech generated by different systems of varying quality. The obtained neural insights have allowed us to extract a small subset of very relevant features from the speech signals and to use these features to build a simple, no-reference instrumental quality metric specifically tailored to TTS speech. The metric is tested on an unseen dataset and shown to significantly outperform a benchmark algorithm.
更多
查看译文
关键词
quality dimensions,naive listeners,consumers,commercial products,costly time,text-to-speech system developers,perceived quality,unconscious cognitive processing,synthesized speech,varying quality,speech signals,no-reference instrumental quality metric,TTS speech,Neuro-inspired No-reference Instrumental Quality Measure,text-to-speech systems,subjective evaluation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要