Age-Dependent Height Estimation And Speaker Normalization For Children'S Speech Using The First Three Subglottal Resonances

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5(2015)

引用 24|浏览24
暂无评分
摘要
This paper proposes an age-dependent scheme for automatic height estimation and speaker normalization of children's speech, using the first three subglottal resonances (SGRs). Similar to previous work, our analysis indicates that children above the age of 11 years show different acoustic properties from those under 11. Therefore, an age-dependent model is investigated. The estimation algorithms for the first three SGRs are motivated by our previous research for adults. The algorithms for the first two SGRs have been applied to children's speech before. This paper proposes a similar approach to estimate Sg3 for children. The algorithm is trained and evaluated on 46 children, aged between 6-17 years, using cross-validation. Average RMS errors in estimating Sgl, Sg2 and Sg3 using the age-dependent model are 51, 128 and 168 Hz, respectively. The height estimation algorithm employs a negative correlation between SGRs and height, and the mean absolute height estimation error was found to be less than 3.8cm for the younger children and 4.9cm for the older children. In addition, using TIDIGITS, a linear frequency warping scheme using age-dependent Sg3 gives statistically-significant word error rate reductions (up to 26%) relative to conventional VTLN.
更多
查看译文
关键词
children's speech, subglottal resonances, height estimation, speaker normalization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要