Effective use of DCTS for contextualizing features for speaker recognition

ICASSP(2014)

引用 20|浏览103
暂无评分
摘要
This article proposes a new approach for contextualizing features for speaker recognition through the discrete cosine transform (DCT). Specifically, we apply a 2D-DCT transform on the Mel filterbank outputs to replace the common Mel frequency cepstral coefficients (MFCCs) appended by deltas and double deltas. A thorough comparison of algorithms for delta computation and DCT-based contextualization for speaker recognition is provided and the effect of varying the size of analysis window in each case is considered. Selection of 2D-DCT coefficients using a zig-zag approach permits definition of an arbitrary feature dimension using the most energized coefficients. We show that 60 coefficients computed using our approach outperforms the standard MFCCs appended with double deltas by up to 25% relative on the NIST 2012 speaker recognition evaluation (SRE) corpus in both Cprimary and equal error rate (EER) while additional coefficients increase system robustness to noise.
更多
查看译文
关键词
speaker recognition,most energized coefficients,analysis window size,mel filter bank outputs,discrete cosine transform,equal error rate,2d-dct transform,discrete cosine transforms,double deltas,channel bank filters,2d-dct,eer,sre,dct-based contextualization,contextualization,contextualizing features,mfccs,arbitrary feature dimension,deltas,2d-dct coefficient selection,nist 2012 speaker recognition evaluation corpus,mel frequency cepstral coefficients,filterbank energies,zig-zag approach,noise measurement,speech,speech recognition,feature extraction,nist
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要