Audio-visual fusion using bayesian model combination for web video retrieval.

MM '11: ACM Multimedia Conference Scottsdale Arizona USA November, 2011(2011)

引用 7|浏览41
暂无评分
摘要
Combining features from multiple, heterogeneous, audio visual sources can significantly improve retrieval performance in consumer domain videos. However, such videos often contain unrelated overlaid audio content, or have significant camera motion to reliably extract visual features. We present an approach, which overcomes errors in individual feature streams by combining classifiers trained on multiple, heterogeneous feature streams using Bayesian model combination (BAYCOM). We demonstrate our method, by combining low-level audio and visual features, for classification of a large 200 hour web video corpus. The combined models outperform any of the individual features by 10%. Further, BAYCOM consistently outperforms traditional early and late fusion methods.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要