Prototypical speaker-interference loss for target voice separation using non-parallel audio samples

Conference of the International Speech Communication Association (INTERSPEECH)(2022)

Cited 0|Views22
No score
Abstract
In this paper, we propose a new prototypical loss function for training neural network models for target voice separation. Conventional methods use paired parallel audio samples of the target speaker with and without an interfering speaker or noise, and minimize the spectrographic mean squared error (MSE) between the clean and enhanced target speaker audio. Motivated by the use of contrastive loss in speaker recognition task, we had earlier proposed a speaker representation loss that uses representative samples from the target speaker in addition to the conventional MSE loss. In this work, we propose a prototypical speaker-interference (PSI) loss, that makes use of representative samples from the target speaker, interfering speaker as well as the interfering noise to better utilize any non-parallel data that may be available. The performance of the proposed loss function is evaluated using VoiceFilter, a popular framework for target voice separation. Experimental results show that the proposed PSI loss significantly improves the PESQ scores of the enhanced target speaker audio.
More
Translated text
Key words
target voice separation,speaker-interference,non-parallel
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined