Improving ASR Performance with OCR Through Using Word Frequency Difference.
International Conference on Electronics, Information and Communications(2024)
摘要
Recently, there has been a growing interest in conversational artificial intelligence (AI). As a result, research is actively being conducted on automatic speech recognition (ASR) to facilitate interactions between humans and machines. This paper proposes a system that enhances ASR performance. The proposed method accumulates images captured from lecture videos in real-time every 30 seconds. The frequency ratios between text data from captured images and text data calculated offline from over 333K are used to improve the ASR performance. Experimental results showed that the word error rate (WER) decreased by a maximum of 0.68% compared to using only the traditional ASR. Especially, the recognition rate for specialized terms frequently used in lectures showed an improvement of 64%.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要