Efficient Text-Only Domain Adaptation For CTC-Based ASR.

Chang Chen,Xun Gong,Yanmin Qian

2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)(2023)

引用 0|浏览0
For connectionist temporal classification (CTC) based speech recognition (ASR) models, text-only domain adaptation still faces several challenges. In this study, we propose an efficient text-only domain adaptation method for CTC-based models. We introduce the assistant textual adapter (ATA) to learn textual features and transform them into the latent space of the acoustic encoder. With the help of the ATA module, the adaptation is achieved by fine-tuning the top layers of the acoustic encoder with the target domain text. Meanwhile, further improvement can be obtained by the integration with shallow fusion (SF). Adapted from LibriSpeech, experiments show that the proposed method can achieve averaged 29.7% relative WER reduction (WERR) compared with the un-adapted baseline on WSJ, and 10.5% WERR compared to SF as well. Moreover, it also shows 15.4∼37.1% WERR for 10 GigaSpeech target domains test sets compared to the un-adapted baseline, and also 6.5% WERR on average compared with SF.
text-only domain adaptation,connectionist temporal classification,end-to-end speech recognition
AI 理解论文
Chat Paper