Layered dynamic mixture model for pattern discovery in asynchronous multi-modal streams [video applications]
2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING(2005)
摘要
We propose a layered dynamic mixture model for asynchronous multi-modal fusion for unsupervised pattern discovery in video. The lower layer of the model uses generative temporal structures such as a hierarchical hidden Markov model to convert the audiovisual streams into mid-level labels, it also models the correlations in text with probabilistic latent semantic analysis. The upper layer fuses the statistical evidence across diverse modalities with a flexible meta-mixture model that assumes loose temporal correspondence. Evaluation on a large news database shows that multi-modal clusters have better correspondence to news topics than audio-visual clusters alone; novel analysis techniques suggest that meaningful clusters occur when the prediction of salient features by the model concurs with those shown in the story clusters.
更多查看译文
关键词
correlation methods,hidden Markov models,multimedia computing,pattern recognition,statistical analysis,video signal processing,asynchronous multimodal fusion,asynchronous multimodal video streams,audiovisual streams,hierarchical hidden Markov model,layered dynamic mixture model,meta-mixture model,multimodal clusters,pattern discovery,probabilistic latent semantic analysis,text correlations,unsupervised pattern discovery,
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络