AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
This paper has described the development of a deep neural networks BNF

Deep Neural Network Approaches to Speaker and Language Recognition

Signal Processing Letters, IEEE  , no. 10 (2015): 1671-1675

Cited by: 329|Views142
EI WOS

Abstract

The impressive gains in performance obtained using deep neural networks (DNNs) for automatic speech recognition (ASR) have motivated the application of DNNs to other speech technologies such as speaker recognition (SR) and language recognition (LR). Prior work has shown performance gains for separate SR and LR tasks using DNNs for direct ...More

Code:

Data:

0
Introduction
  • T HE impressive gains in performance obtained using deep neural networks (DNNs) for automatic speech recognition (ASR) [1] have motivated the application of DNNs to other speech technologies such as speaker recognition (SR) and language recognition (LR) [2]–[11].
  • The first or “direct” method uses a DNN trained as a classifier for the intended recognition task directly to discriminate between speakers for SR [5], [11] or languages for LR [4].
  • The second or “indirect” method uses a DNN possibly trained for a different purpose to extract data that is used to train a secondary classifier for the intended recognition task.
  • Date of publication April 06, 2015; date of current version April 21, 2015.
Highlights
  • T HE impressive gains in performance obtained using deep neural networks (DNNs) for automatic speech recognition (ASR) [1] have motivated the application of DNNs to other speech technologies such as speaker recognition (SR) and language recognition (LR) [2]–[11]
  • In Section IV we describe initial LR experiments which motivate the focus on two indirect methods
  • This paper has described the development of a DNN BNF
  • On LRE11, the same BNFs decreased at 30 s, 10 s, and 3 s durations by 48%, 39%, and 24%, respectively, and even out performed a 5 system fusion of acoustic and phonetic based recognizers [21]
  • Further reductions in error were demonstrated on the DAC13
  • Using tandem features lead to a larger reduction in error rate of 23% for EER and
Results
  • DAC13 out-of-domain condition and a 48% reduction in on the LRE11 30 s test condition.
  • Using tandem features lead to a larger reduction in error rate of 23% for EER and.
  • LRE11 task lead to 16%, 13% and 8% reduction in on the 30 s, 10 s and 3 s durations conditions
Conclusion
  • This paper has described the development of a DNN BNF i-vector system and demonstrated substantial performance gains when applying the system to both the DAC13 SR and LRE11 LR benchmarks.
  • For the DAC13 task the BNF/GMM system was shown to reduce the error rates of the baseline MFCC/GMM system by 26% for EER and 33% for DCF for the in-domain task and 55% for EER and 47% for DCF for the out-of-domain task.
  • SR task using score fusion or tandem features.
  • BNF/GMM and MFCC/DNN system scores reduces the error rates relative to the BNF/GMM system by 18% for EER and.
  • Using tandem features lead to a larger reduction in error rate of 23% for EER and
Tables
  • Table1: INITIAL LR DIRECT DNN PERFORMANCE (
  • Table2: LRE11 RESULTS
  • Table3: INITIAL LR INDIRECT DNN PERFORMANCE (
  • Table4: OUT-OF-DOMAIN DAC13 RESULTS
  • Table5: FUSION OF ALL SYSTEM AND THE TOP 2 SYSTEM ON DAC13
  • Table6: IN-DOMAIN DAC13 RESULTS
  • Table7: LRE11 FUSION
Download tables as Excel
Funding
  • This work was supported by the U
Reference
  • G. Hinton et al., “Deep neural networks for acoustic modeling in speech recognition,” IEEE Signal Process. Mag., pp. 82–97, Nov. 2012.
    Google ScholarLocate open access versionFindings
  • Y. Song, B. Jiang, Y. Bao, S. Wei, and L.-R. Dai, “I-vector representation based on bottleneck features for language identification,” Electron. Lett., pp. 1569–1580, 2013.
    Google ScholarLocate open access versionFindings
  • P. Matejka, L. Zhang, T. Ng, H. S. Mallidi, O. Glembek, J. Ma, and B. Zhang, “Neural network bottleneck features for language identification,” in Proc. IEEE Odyssey, 2014, pp. 299–304.
    Google ScholarLocate open access versionFindings
  • I. Lopez-Moreno, J. Gonzalez-Dominguez, O. Plchot, D. Martinez, J. Gonzalez-Rodriguez, and P. Moreno, “Automatic language identification using deep neural networks,” in Proc. ICASSP, 2014, pp. 5374–5378.
    Google ScholarLocate open access versionFindings
  • T. Yamada, L. Wang, and A. Kai, “Improvement of distant-talking speaker identification using bottleneck features of dnn,” in Proc. Interspeech, 2013, pp. 3661–3664.
    Google ScholarLocate open access versionFindings
  • Y. Lei, N. Scheffer, L. Ferrer, and M. McLaren, “A novel scheme for speaker recognition using a phonetically-aware deep neural network,” in Proc. ICASSP, 2014, pp. 1714–1718.
    Google ScholarLocate open access versionFindings
  • Y. Lei, L. Ferrer, A. Lawson, M. McLaren, and N. Scheffer, “Application of convolutional neural networks to language identification in noisy conditions,” in Proc. IEEE Odyssey, 2014, pp. 287–292.
    Google ScholarLocate open access versionFindings
  • P. Kenny, V. Gupta, T. Stafylakis, P. Ouellet, and J. Alam, “Deep neural networks for extracting baum-welch statistics for speaker recognition,” in Proc. IEEE Odyssey, 2014, pp. 293–298.
    Google ScholarLocate open access versionFindings
  • O. Ghahabi and J. Hernando, “I-vector modeling with deep belief networks for multi-session speaker recognition,” in Proc. IEEE Odyssey, 2014, pp. 305–310.
    Google ScholarLocate open access versionFindings
  • S. Yaman, J. Pelecanos, and R. Sarikaya, “Bottleneck features for speaker recognition,” in Proc. IEEE Odyssey, 2012.
    Google ScholarLocate open access versionFindings
  • E. Variani, X. Lei, E. McDermott, I. Lopez-Moreno, and J. GonzalezDominguez, “Deep neural networks for small footprint text-dependent speaker verification,” in Proc. ICASSP, 2014, pp. 4080–4084.
    Google ScholarLocate open access versionFindings
  • A. K. Sarkar, C.-T. Do, V.-B. Le, and C. Barras, “Combination of cepstral and phonetically discriminative features for speaker verification,” IEEE Signal Process. Lett., vol. 21, no. 9, pp. 1040–1044, Sep. 2014.
    Google ScholarLocate open access versionFindings
  • N. Dehak, P. Kenny, R. Dehak, P. Ouellet, and P. Dumouchel, “Front end factor analysis for speaker verification,” IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 4, pp. 788–798, May 2011.
    Google ScholarLocate open access versionFindings
  • N. Dehak, P. Torres-Carrasquillo, D. Reynolds, and R. Dehak, “Language recognition via ivectors and dimensionality reduction,” in Proc. Interspeech, 2011, pp. 857–860.
    Google ScholarLocate open access versionFindings
  • Y. Zhang, E. Chuangsuwanich, and J. Glass, “Extracting deep neural network bottleneck features using low-rank matrix factorization,” in Proc. ICASSP, 2014, pp. 185–189.
    Google ScholarLocate open access versionFindings
  • L. Deng, G. Hinton, and B. Kingsbury, “New types of deep neural network learning for speech recognition and related applications: An overview,” in Proc. ICASSP, 2013.
    Google ScholarLocate open access versionFindings
  • D. Garcia-Romero, X. Zhang, A. McCree, and D. Povey, “Improving speaker recognition performance in the domain adaptation challenge using deep neural networks,” in Proc. IEEE SLT Workshop, 2014.
    Google ScholarLocate open access versionFindings
  • K. Vesely, M. Karafiat, and F. Grezl, “Convolutive bottleneck network features for lvcsr,” in Proc. IEEE ASRU, 2011, pp. 42–47.
    Google ScholarLocate open access versionFindings
  • T. N. Sainath, B. Kingsbury, V. Sindhwani, E. Arisoy, and B. Ramabhadran, “Low-rank matrix factorization for deep neural network training with high-dimensional output targets,” in Proc. ICASSP, 2013.
    Google ScholarLocate open access versionFindings
  • D. Garcia-Romero and C. Y. Espy-Wilson, “Analysis of i-vector length normalization in speaker recognition systems,” in Proc. Interspeech, 2011, pp. 249–252.
    Google ScholarLocate open access versionFindings
  • E. Singer, P. Torres-Carrasquillo, D. Reynolds, A. Mc-Cree, F. Richardson, N. Dehak, and D. Sturim, “The MITLL NIST LRE 2011 language recognition system,” in Proc. IEEE Odyssey, 2011, pp. 209–215.
    Google ScholarLocate open access versionFindings
  • “The 2009nist language recognition evaluation plan,” 2009 [Online]. Available: http://www.itl.nist.gov/iad/mig/tests/lre/2009/
    Findings
  • P. Torres-Carrasquillo, E. Singer, T. Gleason, A. McCree, D. A. Reynolds, F. Richardson, and D. Sturim, “The MITLL NIST LRE 2009 language recognition system,” in Proc. ICASSP, 2010, pp. 4994–4997.
    Google ScholarLocate open access versionFindings
  • B. Xiang, U. Chaudhari, J. Navratil, G. Ramaswamy, and R. Gopinath, “Short-time Gaussianization for robust speaker verification,” in Proc. ICASSP, 2002.
    Google ScholarLocate open access versionFindings
  • J. Godfrey, E. Holliman, and J. McDaniel, “Switchboard: Telephone speech corpus for research and development,” in Proc. ICASSP, 1992, pp. 517–520.
    Google ScholarLocate open access versionFindings
  • D. Povey et al., “The kaldi speech recognition toolkit,” in Proc. IEEE ASRU, 2011.
    Google ScholarLocate open access versionFindings
  • “The 2011 nist language recognition evaluation plan,” 2011 [Online]. Available: http://www.nist.gov/itl/iad/mig/lre11.cfm
    Findings
  • S. H. Shum, D. A. Reynolds, D. Garcia-Romero, and A. McCree, “Unsupervised clustering approaches for domain adaptation in speaker recognition systems,” in Proc. IEEE Odyssey, 2014, pp. 265–272.
    Google ScholarLocate open access versionFindings
  • H. Aronowitz, “Inter dataset variability compensation for speaker recognition,” in Proc. ICASSP, 2014.
    Google ScholarLocate open access versionFindings
  • O. Glembek, J. Ma, P. Matejka, B. Zhang, O. Plchot, L. Burget, and S. Matsoukas, “Domain adaptation via within-class covariance correction in i-vector based speaker recognition systems,” in Proc. ICASSP, 2014.
    Google ScholarLocate open access versionFindings
  • H. Hermansky, D. P. W. Ellis, and S. Sharma, “Tandem connectionist feature extraction for conventional HMM systems,” in Proc. ICASSP, 2000.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科