Short-Segment Speaker Verification Using ECAPA-TDNN with Multi-Resolution Encoder

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2023)

引用 4|浏览0
Time-domain approaches have shown the potential to improve the performance of speaker verification, but still predominant approaches utilize hand-crafted features such as the mel filterbank energies. Although these features are based on speech perception models and exhibited impressive performances, the fixed frame size does not allow good temporal and spectral resolutions at the same time and there is information loss when taking the magnitude spectrum and during frequency rescaling. In this paper, we propose to incorporate multi-resolution time-domain information into the ECAPA-TDNN speaker verification system. We construct a multi-resolution encoder to extract multiple features in different temporal resolutions, and let the extracted features drive the adapter modules. Experimental results showed that the proposed method outperformed other recently proposed approaches when the input length was 2 seconds or shorter for the VoxCeleb dataset. The proposed approach also showed superior performance on the Google Speech Commands dataset v2.
speaker verification,multi-resolution,short-segments,learnable transformation,adapter module
AI 理解论文
Chat Paper