Improving ASR Performance with OCR Through Using Word Frequency Difference.

Kyudan Jung, Seungmin Bae,Nam Joon Kim, Hyun Gon Ryu,Hyuk-Jae Lee

International Conference on Electronics, Information and Communications(2024)

引用 0|浏览2
暂无评分
摘要
Recently, there has been a growing interest in conversational artificial intelligence (AI). As a result, research is actively being conducted on automatic speech recognition (ASR) to facilitate interactions between humans and machines. This paper proposes a system that enhances ASR performance. The proposed method accumulates images captured from lecture videos in real-time every 30 seconds. The frequency ratios between text data from captured images and text data calculated offline from over 333K are used to improve the ASR performance. Experimental results showed that the word error rate (WER) decreased by a maximum of 0.68% compared to using only the traditional ASR. Especially, the recognition rate for specialized terms frequently used in lectures showed an improvement of 64%.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要