Streamed Punctuation Annotation using Transformers

semanticscholar（2021）

引用 0|浏览0

暂无评分

摘要

To improve readability, punctuation prediction is typically performed on text output by an Automatic Speech Recognition (ASR) model. We introduce a Transformer-based model to predict punctuation marks on unpunctuated text suitable for text streamed word-for-word, as is often the case for ASR models. We propose a decoding strategy that delays punctuation marks’ insertion in case of uncertainty until a specific threshold is reached. Leveraging existing pre-trained language models in conjunction with a special token for acoustic pause features, we achieve state-of-the-art performance for punctuation prediction on the MGB dataset and results that compare favourably to the state-of-the-art on the IWSLT11 dataset while using comparatively less computing power than previous work by using downsampling. To make the model viable for real-time use in combination with an ASR system and on low-resource devices, we evaluate input truncation and weight quantization. We show these techniques lead to faster-than-real-time inference speeds and a significant reduction in model size.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要