Interpretable Control for Emotional Text-to-Speech System toward Development of Sympathetic Educational-Support Robots

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)(2022)

引用 0|浏览7
暂无评分
摘要
With increasing aging, sympathetic educational-support robots (sympathetic robots) that can assist young learners have been attracting the attention of researchers. However, the visually-driven interactions (emotional body motions and facial expressions of robots) that have been studied don't provide sufficiently good interactivity and may distract learners. In this paper, we develop an emotional text-to-speech (TTS) synthesis system to be implemented for sympathetic robots. As a speech system oriented to be equipped with sympathetic robots that can speak and express their own emotions by voice, the control of variable emotional expression in the synthesized speech during interaction needs to be fully considered. Towards the development of sympathetic robots providing sufficiently good interactivity, we propose an emotional TTS system architecture using both a global style tokens (GSTs) module and a set of arousal-valence tokens to flexibly control the emotional expression of synthesized speech by two interpretable annotations, categorical and dimensional, respectively. The experimental results demonstrate that our model can flexibly control the emotional expression of the synthesized speech and can satisfy the demand of the application to sympathetic robots.
更多
查看译文
关键词
educational-support robots, speech synthesis, prosody control, human-robot interaction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要