End-to-End Speech Recognition Technology Based on Multi-Stream CNN.

Hao Xiao,Yuan Qiu,Rong Fei, Xiongbo Chen, Zuo Liu, Zongling Wu

TrustCom(2022)

引用 0|浏览2
暂无评分
摘要
At a time when end-to-end speech recognition technology is becoming more and more popular, we conduct research on various end-to-end speech technologies, and use the Transformer-based speech framework to study and find that its multi-head attention is not effective in local feature acquisition. And in the face of noise problems in real scenes, the training convergence speed is too slow. In order to solve the problems caused by Transformer, a new speech recognition framework based on MCNN-Transformer-CTC speech recognition method is proposed. Through MCNN (multi-stream convolutional neural network) in the pre-acoustic unit through multiple parallel channels Local feature extraction is carried out in terms of time width and spectral capability, which makes up for the lack of selfattention mechanism in local feature extraction, and the multitask learning method is used to add CTC structure to make up for the problem of slow training convergence. The training effect of this model on the Aishell1 dataset has reached a CER of 6.23%, which is a further improvement compared to the Transformer model.
更多
查看译文
关键词
Speech Recognition, MCNN, Transformer, CTC
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要