Comparative study of different features on OLLO logatome recognition task

msra(2006)

引用 23|浏览6
暂无评分
摘要
We compare the ASR performances of different fea- tures sets (MFCC, PLP, constant JRASTA PLP and vari- able scale piece-wise quasi-stationary analyzed MFCC features (1)) on the OLdenburg LOgatome speech cor- pus (OLLO)(2). OLLO database is rich in various speech variabilities such as different speaking styles (slow, fast, statement, questioning, loud and soft) and with almost equal sampling of the male and female speakers. A HMM-GMM system has been trained on the no-accent part of the OLLO database that consists of roughly 13,500 utterances and then tested on the no-accent part of the test set that roughly consists of 13,800 utterances. Each of these utterances correspond to a logatome. We compare state-of the art fixed time scale (20ms long win- dows) features with the recently proposed variable scale quasi-stationary analyzed(1) MFCC features This tech- nique results in a variable scale time spectral analysis, adaptively estimating the largest possible analysis win- dow size such that the signal remains quasi-stationary, thus the best temporal/frequency resolution tradeoff. The speech recognition experiments on the OLLO database, show that the proposed variable-scale piecewise station- ary spectral analysis based features indeed yield im- proved recognition accuracy in clean conditions, com- pared to MFCC, PLP and constant-JRASTA PLP fea- tures.
更多
查看译文
关键词
speech recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要