Hybrid Syllable and Character Representations for Mandarin ASR

Fengrun Zhang,Chengfei Li,Shuhao Deng, Yaoping Wang,Jinfeng Bai

2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC(2023)

引用 0|浏览5
暂无评分
摘要
With the development of deep learning, End-to-End (E2E) automatic speech recognition (ASR) based on Connectionist Temporal Classification (CTC) and attention has achieved great success and become the most popular method. In speech recognition, the selection of modeling units is critical. Most of the time, the modeling units of Mandarin are Chinese characters. However, the phenomenon of homophones and polyphonic characters in Chinese is very common, which degrades ASR performance. Pinyin can be regarded as the syllables of Chinese characters, which can reflect the pronunciation information of Chinese characters. In E2E ASR, due to the sequence-to-sequence form, Chinese characters directly correspond to the acoustic features and lack intermediate-level representations. In this paper, we introduce pinyin with tones as an auxiliary modeling unit to compensate for the mismatch between Chinese characters and acoustic features. On the basis of the hybrid modeling of syllables and Chinese characters, we propose a multi-task ASR model based on syllables and characters, which introduces a syllable CTC decoder and an attention decoder from syllables to Chinese characters to the joint CTC-attention model. Furthermore, a method of syllable auxiliary attention-rescoring method is proposed. Compared with the character-based ASR model, our method achieves a relative 8.6%/9.4% character error rate (CER) drop on Aishell-1 by greedy-search/attention-rescoring.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要