Review of Various Machine Learning and Deep Learning Techniques for Audio Visual Automatic Speech Recognition

2023 International Conference on Intelligent Systems, Advanced Computing and Communication (ISACC)(2023)

引用 2|浏览12
暂无评分
摘要
The visual cues obtained from the face and mouth region of a speaker provide valuable information for speech per-ception. The idea of audio visual speech recognition is to combine visual information with acoustic speech signals to enhance the intelligibility of speech in the presence of ambient noises. In audio visual speech recognition lip image sequences of speakers are used along with acoustic signals to convert speech into text. Researchers are exploring ways to upgrade the performance of audio visual speech recognition and solve certain real life problems like designing voice dialling systems, highly secured biometric systems for authentication etc. A review of the latest research findings on audio visual automatic speech recognition using traditional machine learning, neural networks and other deep learning techniques is presented in this work. This paper describes future research opportunities through a comparative analysis of the various techniques used in the literature for the different stages of audiovisual speech recognition, including the region of interest detection, audio and visual speech feature extraction and fusion of the modalities.
更多
查看译文
关键词
audio visual speech recognition,feature extraction,deep learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要