A Comparison of Parameter-Efficient ASR Domain Adaptation Methods for Universal Speech and Language Models

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2024)

引用 0|浏览9
暂无评分
摘要
A recent paradigm shift in artificial intelligence has seen the rise of foundation models, such as the large language models and the universal speech models. With billions of model parameters and trained with a wide range of data, these foundation models are expected to have a better generalization to different downstream tasks. Efficient adaptation is the key to leveraging these foundation models in a new task or domain. In this paper, we compare several popular parameter-efficient tuning methods, such as vector adaptation, residual adapters, low-rank adapter (LoRA) and prompt-tuning, for automatic speech recognition (ASR) domain adaptation. We use the connectionist temporal classification (CTC) model with Conformer encoder and fused it with a universal language model. We study the effect of adapting either or both of the Conformer encoder and the universal language model. We carry out extensive experiments to study these methods under different hyper-parameter settings and the effect of combining some of these methods. We find that combining vector adaptation and residual adapters with increasing bottleneck dimension achieved the best performance.
更多
查看译文
关键词
Parameter-efficient adaptation,foundation model,universal speech model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要