Residual Time-restricted Self-Attentive TDNN Speaker Embedding for Noisy and Far-field Conditions

2022 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE)(2022)

引用 0|浏览16
暂无评分
摘要
One of the emerging challenges in automatic speaker recognition is the development of systems that are robust to noisy and far-field conditions. The current standard for x-vector speaker embedding is based on a time-delay neural network (TDNN) and is less robust than systems based on a residual network (ResNet) and other baseline systems that use signal enhancement preprocessing in presence of these conditions. In this study, we improve the performance of TDNN-based embedding by integrating a residual block with a time-restricted self-attention option (AttResBlock) into the TDNN frame level. Experiments using the Voices Obscured in Complex Environmental Settings (VOiCES) corpus are carried out to evaluate the proposed speaker embedding extractor (AttResBlock-TDNN). The experimental results show that AttResBlock-TDNN outperforms state-of-the-art systems under many adverse conditions. For instance, the proposed AttResBlock-TDNN produces relative improvements in the minDCF and EER of 11.4% and 15.5%, respectively, over the original TDNN-based encoder.
更多
查看译文
关键词
speaker recognition,noisy conditions,far-field,speaker embedding,residual block,self-attention mechanism
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要