Competing Speaker Count Estimation on the Fusion of the Spectral and Spatial Embedding Space.

INTERSPEECH(2020)

引用 7|浏览12
暂无评分
摘要
This paper presents a method for estimating the competing speaker count with deep spectral and spatial embedding fusion. The basic idea is that mixed speech can be projected into an embedding space using neural networks where embedding vectors are orthogonal for different speakers while parallel for the same speaker. Therefore, speaker count estimation can be performed by computing the rank of the mean covariance matrix of the embedding vectors. It is also a feature combination method in speaker embedding space instead of simply combining features at the input layer of neural networks. Experimental results show that embedding-based method is better than classification-based method where the network directly predicts the count of speakers and spatial features help to speaker count estimation. In addition, the features combined in the embedding space can achieve more accurate speaker counting than features combined at the input layer of nueral networks when tested on anechoic and reverberant datasets.
更多
查看译文
关键词
speaker count estimation, embedding space, competing speaker
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要