Semi-supervised Acoustic and Language Modeling for Hindi ASR

Conference of the International Speech Communication Association (INTERSPEECH)(2022)

引用 0|浏览5
暂无评分
摘要
This paper describes the submission made by our team to the Hindi Gram Vaani ASR challenge. This challenge involves building an ASR system for spontaneous telephonic recordings. The challenge is unique because of the small amount of labelled data available for model development. On top of that, the acoustic variabilities such as spontaneity of natural conversations, rich diversity of Hindi across India and varied backgrounds present in the corpus make it much more challenging. We participated in two of the three tracks where the first track involves 100 hours of labelled speech only and the second track involves 1000 hours of additional unlabelled corpus along with 100 hours of labelled speech. A Kaldi based hybrid model has been developed for the first and second track involving TDNNF character based acoustic model, N-gram first pass decoding, RNN-LM re-scoring and system combinations. On the other hand, for the second track, an E2E conformer based system has been trained on representations obtained from a contrastive predictive coding (CPC) model. The results obtained for both the tracks are significantly better than the baseline results published by the challenge organizers on the development set consisting of 5 hours of audio.
更多
查看译文
关键词
language modeling,acoustic,semi-supervised
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要