Intrinsic Spectral Analysis For Zero And High Resource Speech Recognition
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3(2012)
摘要
The constraints of the speech production apparatus imply that our vocalizations are approximately restricted to a low-dimensional manifold embedded in a high-dimensional space. Manifold learning algorithms provide a means to recover the approximate embedding from untranscribed data and enable use of the manifold's intrinsic distance metric to characterize acoustic similarity for downstream automatic speech applications. In this paper, we consider a previously unevaluated nonlinear out-of-sample extension for intrinsic spectral analysis (ISA), investigating its performance in both unsupervised and supervised tasks. In the zero resource regime, where the lack of transcribed resources forces us to rely solely on the phonetic salience of the acoustic features themselves, ISA provides substantial gains relative to canonical acoustic front-ends. When large amounts of transcribed speech for supervised acoustic model training are also available, we find that the data-driven intrinsic spectrogram matches the performance of and is complementary to these signal processing derived counterparts.
更多查看译文
关键词
intrinsic spectral analysis,manifold learning,speech recognition,zero resource
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络