The JHU-MIT System Description for NIST SRE18

semanticscholar(2019)

引用 17|浏览4
暂无评分
摘要
This document represents the SRE18 system description for the joint effort of the teams at JHU-CLSP, JHU-HLTCOE, MIT Lincoln Labs., MIT CSAIL and LSE-EPITA. All the developed systems consisted of Neural network/i-vector embeddings with some flavor of PLDA back-end. The systems were tailored to the video (VAST) condition or to the telephone condition (CMN2). For VAST, the primary system was a fusion of a 16 kHz TDNN x-vector, 16 kHz factorized TDNN x-vector, 8 kHz TDNN x-vector and 8 kHz ResNet34-Attention embedding. For CMN2, the primary was a fusion of two TDNN x-vectors and ResNet34-Attention embedding. For development in the VAST condition, we used the SITW eval core-multi dataset where we obtained Cprimary=0.105. For telephone, we used the SRE18 dev CMN2 where we obtained Cprimary=0.256. The contrastive submissions included the best single system (JHUHLTCOE, SITW Cp=0.137, CMN2 Cp=0.312); and the best fusions of 1, 2, 3,... systems from the JHU-CLSP-MIT sub-team.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要