A Comparison of Parameter-Efficient ASR Domain Adaptation Methods for Universal Speech and Language Models

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2024)

引用 0|浏览9
A recent paradigm shift in artificial intelligence has seen the rise of foundation models, such as the large language models and the universal speech models. With billions of model parameters and trained with a wide range of data, these foundation models are expected to have a better generalization to different downstream tasks. Efficient adaptation is the key to leveraging these foundation models in a new task or domain. In this paper, we compare several popular parameter-efficient tuning methods, such as vector adaptation, residual adapters, low-rank adapter (LoRA) and prompt-tuning, for automatic speech recognition (ASR) domain adaptation. We use the connectionist temporal classification (CTC) model with Conformer encoder and fused it with a universal language model. We study the effect of adapting either or both of the Conformer encoder and the universal language model. We carry out extensive experiments to study these methods under different hyper-parameter settings and the effect of combining some of these methods. We find that combining vector adaptation and residual adapters with increasing bottleneck dimension achieved the best performance.
Parameter-efficient adaptation,foundation model,universal speech model
AI 理解论文
Chat Paper