Low-resource speech recognition and dialect identification of Irish in a multi-task framework
arxiv(2024)
摘要
This paper explores the use of Hybrid CTC/Attention encoder-decoder models
trained with Intermediate CTC (InterCTC) for Irish (Gaelic) low-resource speech
recognition (ASR) and dialect identification (DID). Results are compared to the
current best performing models trained for ASR (TDNN-HMM) and DID (ECAPA-TDNN).
An optimal InterCTC setting is initially established using a Conformer encoder.
This setting is then used to train a model with an E-branchformer encoder and
the performance of both architectures are compared. A multi-task fine-tuning
approach is adopted for language model (LM) shallow fusion. The experiments
yielded an improvement in DID accuracy of 10.8
ECAPA-TDNN, and WER performance approaching the TDNN-HMM model. This multi-task
approach emerges as a promising strategy for Irish low-resource ASR and DID.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要