Bridging Mixture Density Networks With Meta-Learning For Automatic Speaker Identification

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING(2020)

引用 12|浏览40
暂无评分
摘要
Speaker identification answers the fundamental question "Who is speaking?" The identification technology enables various downstream applications to provide a personalized experience. Both the prevalent i-vector based solutions and the state-of-the-art deep learning solutions usually treat all users equally, with no distinctions between new users and existing users, during the training process. We notice that a good many new users start with limited labeled training data, which often results in inferior predicting performance of recognizing users' voices. To alleviate the disadvantage caused by training data deficiency, we propose a Mixture Density Networkbased Meta-Learning method (MDNML) for speaker identification. MDNML emphasizes the expeditious process of learning to recognize new users where each has only a few seconds of labeled data.We conduct experiments on the LibriSpeech dataset and compare MDNML with four state-of-the-art baseline methods. The results conclude that MDNML achieves higher accuracy in recognizing new users with limited labeled utterances than all baseline methods. Our proposed solution significantly expedites the learning by transferring the knowledge learned from the existing user base through gradientbased meta-learning. We consider our work to be a steppingstone for more sophisticated meta-learning frameworks for accelerating voice recognition. Furthermore, we discuss a strategy for enhancing the accuracy by incorporating the notion of household-based acoustic profiles with MDNML.
更多
查看译文
关键词
mixture density networks, meta-learning, new users, speaker identification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要