Tandem Representations Of Spectral Envelope And Modulation Frequency Features For Asr
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5(2009)
摘要
We present a feature extraction technique for automatic speech recognition that uses Tandem representation of short-term spectral envelope and modulation frequency features. These features, derived from sub-band temporal envelopes of speech estimated using frequency domain linear prediction, arc combined at the phoneme posterior level. Tandem representations derived from these phoneme posteriors are used along with HMM based ASR systems for both small and large vocabulary continuous speech recognition (LVCSR) tasks. For a small vocabulary continuous digit task on the OGI Digits database, the proposed features reduce the word error rate (WER) by 13 % relative to other feature extraction techniques. We obtain a relative reduction of about 14 % in WER for an LVCSR task using the NIST RT05 evaluation data. For phoneme recognition tasks on the TIMIT database these features provide a relative improvement of 13% compared to other techniques.
更多查看译文
关键词
Frequency Domain Linear Prediction (FDLP), Spectral Envelope Features, Modulation Frequency Features, Tandem based ASR systems
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络