Audio-visual tracking of a variable number of speakers with a random finite set approach

Information Fusion(2014)

引用 23|浏览27
暂无评分
摘要
Speaker tracking in smart environments has attracted an increasing amount of attention in the past few years. Our recent studies show that fusing audio and visual modalities can provide improved robustness and accuracy in some challenging tracking scenarios such as occlusions (by the limited field of view of cameras or by other speakers), as compared with the tracking system based on individual modalities. In these previous works, however, the number of speakers is assumed to be known and remains fixed over the tracking process. In this paper, we focus on a more realistic and complex scenario where the number of speakers is unknown and variable with time. We extend the random finite set (RFS) theory for multi-modal data and devise a particle filter algorithm under the RFS framework for audiovisual (AV) tracking. The experiments on the AV16.3 dataset show the capability of our proposed algorithm for tracking both the number of speakers and the positions of the speakers in challenging scenarios such as occlusions.
更多
查看译文
关键词
particle filtering (numerical methods),set theory,speaker recognition,target tracking,video signal processing,AV16.3 dataset,RFS framework,RFS theory,audio modalities,audio-visual tracking system,multimodal data,particle filter algorithm,random finite set approach,smart environments,speaker tracking process,visual modalities,Audio-visual speaker tracking,random finite set
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要