Effective Fine-tuning Method for Tibetan Low-resource Dialect Speech Recognition

Jiahao Yang,Jianguo Wei, Kuntharrgyal Khysru,Junhai Xu,Wenhuan Lu, Wenjun Ke, Xiaokang Yang

2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC(2023)

引用 0|浏览1
暂无评分
摘要
Tibetan is a distinctive and culturally rich language spoken by millions of people across the Tibetan Plateau and surrounding regions. Exploring the application of speech recognition technology to Tibetan has special significance for preserving language diversity and fostering cultural integration. Moreover, Tibetan comprises a multitude of distinct dialects, which present a hurdle for reusing speech recognition models. In low-resource dialect tasks, conventional approaches endeavor to transfer well-trained models from linguistically akin languages to the target. However, recent studies have shown that an indiscriminate fine-tuning of all parameters may disrupt the feature extractor of the pre-trained model, leading to catastrophic forgetting. This paper introduces an innovative fine-tuning method grounded in model adaptation. Aimed at training automatic speech recognition (ASR) models within the constraints of limited training data and cross-dialect transfer, our novel approach refines a select group of language-specific parameters, leading to robust performance. These parameters, signified by a sparse binary mask identical to the model, circumvent the need for additional parameters. Experiments conducted on two downstream low-resource Tibetan languages show that our proposed methodology outperforms the traditional fine-tuning and adapter based fine-tuning.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要