Sub-vector Extraction and Cascade Post-Processing for Speaker Verification Using MLLR Super-vectors.

arXiv: Sound(2016)

引用 23|浏览81
暂无评分
摘要
this paper, we propose a speaker-verification system based on maximum likelihood linear regression (MLLR) super-vectors, for which speakers are characterized by m-vectors. These vectors are obtained by a uniform segmentation of the speaker MLLR super-vector using an overlapped sliding window. We consider three approaches for MLLR transformation, based on the conventional $1$-best automatic transcription, on the lattice word transcription, or on a simple global universal background model (UBM). Session variability compensation is performed in a post-processing module with probabilistic linear discriminant analysis (PLDA) or the eigen factor radial (EFR). Alternatively, we propose a cascade post-processing for the MLLR super-vector based speaker-verification system. In this case, the m-vectors or MLLR super-vectors are first projected onto a lower-dimensional vector space generated by linear discriminant analysis (LDA). Next, PLDA session variability compensation and scoring is applied to the reduced-dimensional vectors. This approach combines the advantages of both techniques and makes the estimation of PLDA parameters easier. Experimental results on telephone conversations of the NIST 2008 and 2010 speaker recognition evaluation (SRE) indicate that the proposed m-vector system performs significantly better than the conventional system based on the full MLLR super-vectors. Cascade post-processing further reduces the error rate in all cases. Finally, we present the results of fusion with a standard i-vector system in the feature, as well as in the score domain, demonstrating that the m-vector system is both competitive and complementary with it.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要