Intelligibility of synthetic words generated by transformation of a sequence of discrete acoustic events into modulation of the vocal tract shape

The Journal of the Acoustical Society of America(2022)

引用 0|浏览7
暂无评分
摘要
Within the paradigm of a recent model of speech production [Story and Bunton, JASA 146(4), 2522–2528], an utterance is specified as a sequence of relative acoustic events along a time axis. These events consist of directional changes of the vocal tract resonance frequencies called resonance deflection patterns (RDPs) that, when associated with a temporal event function, are transformed via acoustic sensitivity functions, into time-varying modulations of the vocal tract shape. RDPs specifying the targeted directional shift of the first three resonances for bilabial, alveolar, and velar consonants would be coded as [−1 −1 −1], [−1 1 1], and [−1 1 −1], respectively. In this study, these RDPs were combined with four vowels (“ih, ae, eh, uh”) to construct a set of 40 American English words (CVCs). A word intelligibility test was conducted in which listeners heard a synthesized target word and were asked to indicate what they heard by choosing a word from a matrix that included the target and seven near-neighbor words. Results indicate listener word recognition was aligned with the RDP settings, suggesting that they are an effective discrete representation of phonetic segments that can be transformed into speech by modulation of the vocal tract shape.
更多
查看译文
关键词
synthetic words,discrete acoustic events,modulation,transformation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要