Multi-Task Discriminative Training of Hybrid DNN-TVM Model for Speaker Verification with Noisy and Far-Field Speech
INTERSPEECH(2019)
摘要
The paper aims to address the task of speaker verification with single-channel, noisy and far-field speech by learning an embedding or feature representation that is invariant to different acoustic environments. We approach from two different directions. First, we adopt a newly proposed discriminative model that hybridizes Deep Neural Network (DNN) and Total Variability Model (TVM) with the goal of integrating their strengths. DNN helps learning a unique variable length representation of the feature sequence while TVM accumulates them into a fixed dimensional vector. Second, we propose a multitask training scheme with cross entropy and triplet losses in order to obtain good classification performance as well as distinctive speaker embeddings. The multi-task training is applied on both the DNN-TVM model and state-of-the-art x-vector system. The results on the development and evaluation sets of the VOiCES challenge reveal that the proposed multi-task training helps improving models that are solely based on cross entropy, and it works better with DNN-TVM architecture than x-vector for the current task. Moreover, the multi-task models tend to show complementary relationship with cross entropy models, and thus improved performance is observed after fusion.
更多查看译文
关键词
Speaker verification, deep neural networks, total variability model, multi-task training
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要