Motion history images for online speaker/signer diarization

Acoustics, Speech and Signal Processing(2014)

引用 14|浏览21
暂无评分
摘要
We present a solution to the problem of online speaker/signer diarization - the task of determining who spoke/signed when?. Our solution is based on the idea that gestural activity (hands and body movement) is highly correlated with uttering activity. This correlation is necessarily true for sign languages and mostly true for spoken languages. The novel part of our solution is the use of motion history images (MHI) as a likelihood measure for probabilistically detecting uttering activities. MHI is an efficient representation of where and how motion occurred for a fixed period of time. We conducted experiments on 4.9 hours of the AMI meeting data and 1.4 hours of sign language dataset (Kata Kolok data). The best performance obtained is 15.70% for sign language and 31.90% for spoken language (measurements are in DER). These results show that our solution is applicable in real-world applications like video conferences and information retrieval.
更多
查看译文
关键词
image motion analysis,image representation,maximum likelihood estimation,speaker recognition,AMI meeting data,Kata Kolok data,MHI,gestural activity,information retrieval,likelihood measure,motion history images,online speaker-signer diarization,sign language dataset,sign languages,spoken languages,uttering activity,video conferences,Speaker diarization,motion energy images,motion history images,signer diarization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要