2 Histogram Equalization for Robust Speech Recognition

semanticscholar(2008)

引用 1|浏览0
暂无评分
摘要
Optimal Automatic Speech Recognition takes place when the evaluation is done under circumstances identical to those in which the recognition system was trained. In the speech applications demanded in the actual real world this will almost never happen. There are several variability sources which produce mismatches between the training and test conditions. Depending on his physical or emotional state, a speaker will produce sounds with unwanted variations transmitting no acoustic relevant information. The phonetic context of the sounds produced will also introduce undesired variations. Inter-speaker variations must be added to those intra-speaker variations. They are related to the peculiarities of speakers’ vocal track, his gender, his socio-linguistic environment, etc. A third source of variability is constituted by the changes produced in the speaker’s environment and the characteristics of the channel used to communicate. The strategies used to eliminate the group of environmental sources of variation are called Robust Recognition Techniques. Robust Speech Recognition is therefore the recognition made as invulnerable as possible to the changes produced in the evaluation environment. Robustness techniques constitute a fundamental area of research for voice processing. The current challenges for automatic speech recognition can be framed within these work lines: • Speech recognition of coded voice over telephone channels. This task adds an additional difficulty: each telephone channel has its own SNR and frequency response. Speech recognition over telephone lines must perform a channel adaptation with very few specific data channels. • Low SNR environments. Speech Recognition during the 80’s was done inside a silent room with a table microphone. At this moment, the scenarios demanding automatic speech recognition are: • Mobile phones. • Moving cars. • Spontaneous speech. • Speech masked by other speech. • Speech masked by music. • Non-stationary noises. • Co-channel voice interferences. Interferences caused by other speakers constitute a bigger challenge than those changes in the recognition environment due to wide band noises. O pe n A cc es s D at ab as e w w w .in te ch w eb .o rg
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要