Gaze-enhanced speech recognition

Malcolm Slaney,Rahul Rajan,Andreas Stolcke,Partha Parthasarathy

Acoustics, Speech and Signal Processing（2014）

引用 17|浏览29

暂无评分

摘要

This work demonstrates through simulations and experimental work the potential of eye-gaze data to improve speech-recognition results. Multimodal interfaces, where users see information on a display and use their voice to control an interaction, are of growing importance as mobile phones and tablets grow in popularity. We demonstrate an improvement in speech-recognition performance, as measured by word error rate, by rescoring the output from a large-vocabulary speech-recognition system. We use eye-gaze data as a spotlight and collect bigram word statistics near to where the user looks in time and space. We see a 25% relative reduction in the word-error rate over a generic language model, and approximately a 10% reduction in errors over a strong, page-specific baseline language model.

查看译文

关键词

mobile handsets,speech recognition,eye-gaze data,gaze-enhanced speech recognition,generic language model,large-vocabulary speech-recognition system,mobile phones,multimodal interfaces,tablets,Eye Gaze,Pointing,Speech Recognition

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要