Exploring Attention Mechanism For Acoustic-Based Classification Of Speech Utterances Into System-Directed And Non-System-Directed

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)(2019)

引用 34|浏览0
暂无评分
摘要
Voice controlled virtual assistants (VAs) are now available in smartphones, cars, and standalone devices in homes. In most cases, the user needs to first "wake-up" the VA by saying a particular word/phrase every time he/she wants the VA to do something. Eliminating the need for saying the wake-up word for every interaction could improve the user experience. This would require the VA to have the capability of understanding whether the user is talking to it or not. In other words, the challenge is to distinguish between system-directed and non-system-directed speech utterances. In this paper, we present a number of neural network architectures for tackling this classification problem based on using only the acoustic signal. It is shown that a model comprised of convolutional, recurrent, and feed-forward layers can achieve an equal error rate (EER) of below 20% for this task. In addition, we investigate the use of an attention mechanism for helping the model to focus on the more important parts of the signal and to improve handling of variable length inputs sequences. The results show that the proposed attention mechanism significantly improves the model accuracy achieving an EER of 16.25% and 15.62% on two distinct realistic datasets.
更多
查看译文
关键词
Human-machine interaction,spoken utterance classification,wake-up word,attention mechanism
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要