Person Tracking Using Audio-Video Sensor Fusion

msra(2001)

引用 25|浏览15
暂无评分
摘要
Audio and video signals originating from the same source tend to be related. To achieve optimal performance, a tracking system must exploit not just the statistics of each modality alone, but also relationships between the two. Consider a system that tracks moving objects. Such a system may use video data to track the spatial location of an object. If an object emits sound, such a system may use audio data captured by a microphone array to track its location using the time delay of arrival of the audio signals at different microphones. A tracker that exploits both these modalities may be more robust and achieve better performance than one which uses either one alone. Each modality may compensate for weaknesses of the other one. For example, a tracker using only video data may mistake the background for the object or lose track of the object due to occlusion, whereas a tracker that also uses audio data could continue tracking the object by following its sound pattern. Conversely, video data could help where an audio tracker alone may fail to track the object as it stops emitting sound or is masked by some background noise.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要