Feature Mapping, Score-, And Feature-Level Fusion For Improved Normal And Whispered Speech Speaker Verification

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2016)

引用 19|浏览33
暂无评分
摘要
In this paper, automatic speaker verification using normal and whispered speech is explored. Typically, for speaker verification systems with varying vocal effort inputs, standard solutions such as feature mapping or addition of data during parameter estimation (training) and enrollment stages result in a trade-off between accuracy gains with whispered test data and accuracy losses (up to 70% in equal error rate, EER) with normal test data. To overcome this shortcoming, this paper proposes two innovations. First, we show the complementarity of features derived from AM-FM models over conventional mel-frequency cepstral coefficients, thus signalling the importance of instantaneous phase information for whispered speech speaker verification. Next, two fusion schemes are explored: score-and feature-level fusion. Overall, we show that gains as high as 30% and 84% in EER can be achieved for normal and whispered speech, respectively, using feature-level fusion.
更多
查看译文
关键词
Whispered speech,AM-FM model,i-vectors,speaker verification,system fusion,feature mapping
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要