Studying the Effect of Frame-Level Concatenation of GFCC and TS-MFCC Features on Zero-Shot Children's ASR

Ankita, Shambhavi,Syed Shahnawazuddin

SPEECH AND COMPUTER, SPECOM 2023, PT II(2023)

引用 0|浏览0
暂无评分
摘要
The work presented in this paper aims at enhancing the recognition performance of zero-shot children's speech recognition task through frame-level concatenation of two complementary front-end acoustic features. The acoustic features chosen are TANDEM-STRAIGHT-based Mel-frequency cepstral coefficients (TS-MFCC) and Gamma-tone frequency cepstral coefficients (GFCC). The GFCC model the cochlear response of the human auditory system. The MFCC features, on the other hand, model the human pitch perception. Therefore, the GFCC and TS-MFCC features capture the acoustic information differently and that too with very low correlation. Consequently, concatenation of TS-MFCC and GFCC feature vectors helps in modeling complementary and a wider range of relevant acoustic information. This, in turn, enhances the recognition performance significantly. The experimental evaluations presented in this paper show that a relative reduction of nearly 12% is achieved by feature concatenation.
更多
查看译文
关键词
Zero-shot children's ASR,TS-MFCC,Feature concatenation,GFCC
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要