Knowledge Distillation From Offline to Streaming Transducer: Towards Accurate and Fast Streaming Model by Matching Alignments.

2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)(2023)

引用 0|浏览0
暂无评分
摘要
Sequence transducer is a popular end-to-end automatic speech recognition model for streaming scenarios: While, there is a trade-off between accuracy and latency. Latency regularization methods such as FastEmit can reduce latency, but the more they try to reduce latency, the worse accuracy tends to be. Conversely, knowledge distillation (KD) is only used to improve accuracy, and latency is not considered. In this paper, we propose an effective method that combines FastEmit with the KD to reduce latency and improve the accuracy of offline model in scenarios where the latency gap between offline and streaming models gets small. This method reduce the latency gap by applying with FastEmit to both the offline and streaming models. Experimental results on the LibriSpeech dataset show that the model with the best trade-off between accuracy and latency achieves a relative error reduction rate of 7.5% and reduces the latency by $130 \mathrm{~ms}$ compared with the streaming conformer transducer.
更多
查看译文
关键词
speech recognition,streaming,transducer,knowledge distillation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要