Hierarchical Phone Recognition with Compositional Phonetics.

Interspeech(2021)

引用 2|浏览28
暂无评分
摘要
There is growing interest in building phone recognition systems for low-resource languages as the majority of languages do not have any writing systems. Phone recognition systems proposed so far typically derive their phone inventory from the training languages, therefore the derived inventory could only cover a limited number of phones existing in the world. It fails to recognize unseen phones in low-resource or zero-resource languages. In this work, we tackle this problem with a hierarchical model, in which we explicitly model three different entities in a hierarchical manner: phoneme, phone, and phonological articulatory attributes. In particular, we decompose phones into articulatory attributes and compute the phone embedding from the attribute embedding. The model would first predict the distribution over the phones using their embeddings, next, the language-independent phones are aggregated to the language-dependent phonemes and then optimized by the CTC loss. This compositional approach enables us to recognize phones even they do not appear in the training set. We evaluate our model on 47 unseen languages and find the proposed model outperforms baselines by 13.1% PER.
更多
查看译文
关键词
multilingual speech recognition,phone recognition,zero-shot learning,phonetics
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要