Augmenting standard speech recognition features with energy gravity centres

R.De Mori,L. Moisa,R. Gemello, F. Mana,D. Albesano

Periodicals(2001)

引用 8|浏览20
暂无评分
摘要
AbstractThis paper describes an investigation on the possibility of adding new features to classical Mel Scaled Cepstral Coefficients (MFCC) and their time derivatives. A hybrid Automatic Speech Recognition (ASR) system is used based on a Neural Network (NN) and a collection of Hidden Markov Models (HMM). It is shown that the gravity centres (GC) of energies in the frequency bands of the first three formants and their first and second time derivatives can be added to the classical set of MFCCs and their first and second time derivatives, resulting in significant performance improvements. Nevertheless, in some cases, the added parameters may nave a negative effect on performance, because the parameters are reliable only for certain types of sounds as their values may exhibit large variations for the same sound in the presence of additive noise. Experiments have shown that one solution is that of introducing a reliability index indicating the importance the newly added parameters should have in describing a given frame. NNs appear to be suitable devices for taking this fact into account in the computation of observation probabilities. Experiments have also shown improvements when GCs are computed from zero-crossing intervals detected at the output of the filters of an ear model. Intensities are obtained by associating a nonlinear peak amplitude coding to each zero-crossing interval. Consistent improvements are observed when the above-mentioned solutions are applied with medium as well as large size lexicons in the presence of additive noise.
更多
查看译文
关键词
hidden markov model,automatic speech recognition,indexation,neural network,speech recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要