Voicing-, voiceless-, and non-glimpses in speech intelligibility prediction

Journal of the Acoustical Society of America(2023)

引用 0|浏览0
暂无评分
摘要
The number of speech spectro-temporal (S-T) regions escaping from noise masking, known as “glimpses,” is proportional to speech intelligibility in noise. Previous studies have demonstrated that intelligibility can be estimated by calculating the glimpse proportion (GP). More recent evidence revealed that the contribution of glimpses to intelligibility differs in the energy level of the glimpsed regions, and that even non-glimpsed regions play a non-negligible role in speech perception in noise. This study incorporated the voicing-viceless information in estimating intelligibility using glimpses. Before computing the GP, the counts of raw glimpsed regions or those with energy above the mean noise level were weighted according to the voicing-voiceless status of a frame where the glimpses were detected. Evaluated using speech signals processed to have thirteen glimpse compositions in both temporally stationary and fluctuating noise maskers, the linear correlation between model predictions and listeners' word recognition rates increased from 0.76 to 0.80 for weighted GP, and from 0.89 to 0.92 for weighted high-energy GP. Further taking the contribution from non-glimpsed regions into account in the model improved the correlation to 0.95, suggesting that intelligibility in noise can be better predicted when the contributions of different speech regions are finely modelled.
更多
查看译文
关键词
speech,prediction,non-glimpses
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要