CAM - Uninteresting Speech Detector.

INTERSPEECH(2020)

引用 1|浏览18
暂无评分
摘要
Voice assistants such as Siri, Alexa, etc. usually adopt a pipeline to process users' utterances, which generally include transcribing the audio into text, understanding the text, and finally responding back to users. One potential issue is that some utterances could be devoid of any interesting speech, and are thus not worth being processed through the entire pipeline. Examples of uninteresting utterances include those that have too much noise, are devoid of intelligible speech, etc. It is therefore desirable to have a model to filter out such useless utterances before they are ingested for downstream processing, thus saving system resources. Towards this end, we propose the Combination of Audio and Metadata (CAM) detector to identify utterances that contain only uninteresting speech. Our experimental results show that the CAM detector considerably outperforms using either an audio model or a metadata model alone, which demonstrates the effectiveness of the proposed system.
更多
查看译文
关键词
audio event detection, acoustic scene classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要