Low-dimensional interference of mid-level sound statistics predicts human speech recognition in natural environmental noise.

bioRxiv : the preprint server for biology(2024)

引用 0|浏览0
暂无评分
摘要
Recognizing speech in noise, such as in a busy street or restaurant, is an essential listening task where the task difficulty varies across acoustic environments and noise levels. Yet, current cognitive models are unable to account for changing real-world hearing sensitivity. Here, using natural and perturbed background sounds we demonstrate that spectrum and modulations statistics of environmental backgrounds drastically impact human word recognition accuracy and they do so independently of the noise level. These sound statistics can facilitate or hinder recognition - at the same noise level accuracy can range from 0% to 100%, depending on the background. To explain this perceptual variability, we optimized a biologically grounded hierarchical model, consisting of frequency-tuned cochlear filters and subsequent mid-level modulation-tuned filters that account for central auditory tuning. Low-dimensional summary statistics from the mid-level model accurately predict single trial perceptual judgments, accounting for more than 90% of the perceptual variance across backgrounds and noise levels, and substantially outperforming a cochlear model. Furthermore, perceptual transfer functions in the mid-level auditory space identify multi-dimensional natural sound features that impact recognition. Thus speech recognition in natural backgrounds involves interference of multiple summary statistics that are well described by an interpretable, low-dimensional auditory model. Since this framework relates salient natural sound cues to single trial perceptual judgements, it may improve outcomes for auditory prosthetics and clinical measurements of real-world hearing sensitivity.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要