Post-decoder Biasing for End-to-End Speech Recognition of Multi-turn Medical Interview
arxiv(2024)
摘要
End-to-end (E2E) approach is gradually replacing hybrid models for automatic
speech recognition (ASR) tasks. However, the optimization of E2E models lacks
an intuitive method for handling decoding shifts, especially in scenarios with
a large number of domain-specific rare words that hold specific important
meanings. Furthermore, the absence of knowledge-intensive speech datasets in
academia has been a significant limiting factor, and the commonly used speech
corpora exhibit significant disparities with realistic conversation. To address
these challenges, we present Medical Interview (MED-IT), a multi-turn
consultation speech dataset that contains a substantial number of
knowledge-intensive named entities. We also explore methods to enhance the
recognition performance of rare words for E2E models. We propose a novel
approach, post-decoder biasing, which constructs a transform probability matrix
based on the distribution of training transcriptions. This guides the model to
prioritize recognizing words in the biasing list. In our experiments, for
subsets of rare words appearing in the training speech between 10 and 20 times,
and between 1 and 5 times, the proposed method achieves a relative improvement
of 9.3
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要