Paralinguistic Event Detection From Speech Using Probabilistic Time-Series Smoothing And Masking
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5(2013)
摘要
Non-verbal speech cues serve multiple functions in human interaction such as maintaining the conversational flow as well as expressing emotions, personality, and interpersonal attitude. In particular, non-verbal vocalizations such as laughters are associated with affective expressions while vocal fillers are used to hold the floor during a conversation. The Interspeech 2013 Social Signals Sub-Challenge involves detection of these two types of non-verbal signals in telephonic speech dialogs. We extend the challenge baseline system by using filtering and masking techniques on probabilistic time series representing the occurrence of a vocal event. We obtain improved area under receiver operating characteristic (ROC) curve of 93.3% (10.4% absolute improvement) for laughters and 89.7% (6.1% absolute improvement) for fillers on the test set. This improvement suggests the importance of using temporal context for detecting these paralinguistic events.
更多查看译文
关键词
Non-verbal vocalizations,receiver operating characteristic,time series smoothing,time series masking
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络