Partial Rewriting for Multi-Stage ASR
CoRR(2023)
Abstract
For many streaming automatic speech recognition tasks, it is important to
provide timely intermediate streaming results, while refining a high quality
final result. This can be done using a multi-stage architecture, where a small
left-context only model creates streaming results and a larger left- and
right-context model produces a final result at the end. While this
significantly improves the quality of the final results without compromising
the streaming emission latency of the system, streaming results do not benefit
from the quality improvements. Here, we propose using a text manipulation
algorithm that merges the streaming outputs of both models. We improve the
quality of streaming results by around 10%, without altering the final results.
Our approach introduces no additional latency and reduces flickering. It is
also lightweight, does not require retraining the model, and it can be applied
to a wide variety of multi-stage architectures.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined