Finer-grained Modeling units-based Meta-Learning for Low-resource Tibetan Speech Recognition

Conference of the International Speech Communication Association (INTERSPEECH)(2022)

引用 2|浏览19
暂无评分
摘要
Tibetan is a typical under-resourced language due to its relatively smaller population. Although a character-based endto-end (E2E) automatic speech recognition (ASR) model with transfer learning and multilingual training strategies has mitigated the problem of low resources, it often meets over-fitting problem. Recently meta-learning performs great in solving overfitting problem. However, the widely-used coarse-grained modeling units are not significantly correlated to their pronunciation, which limits the performance improvement of the low-resource ASR system. Furthermore, meta-learning consists of a meta-training period and fast self-adaption on the target language, and the past meta-training period is lack target language-specific information. Therefore, we propose a novel E2E low-resource Lhasa dialect ASR model based on the finer-grained modeling units and transfer learning with reference to the properties of Chinese Pinyin. Chinese Pinyin and Tibetan decomposed radicals are more related to pronunciation than characters are, which can compensate for more acoustic information in low-resource situations. Furthermore, Tibetan modeling units are utilized in both meta-training and fast self-adaption processes to offer language-specific information to solve the low-resource problem. Experiments show that our proposed method achieves a 54.9% relative character error reduction rate than the baseline system.
更多
查看译文
关键词
speech recognition,finer-grained,units-based,meta-learning,low-resource
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要