Multimedia classification and event detection using double fusion

Multimedia Tools and Applications(2013)

引用 99|浏览145
暂无评分
摘要
Multimedia Event Detection(MED) is a multimedia retrieval task with the goal of finding videos of a particular event in video archives, given example videos and event descriptions; different from MED, multimedia classification is a task that classifies given videos into specified classes. Both tasks require mining features of example videos to learn the most discriminative features, with best performance resulting from a combination of multiple complementary features. How to combine different features is the focus of this paper. Generally, early fusion and late fusion are two popular combination strategies. The former one fuses features before performing classification and the latter one combines output of classifiers from different features. Early fusion can better capture the relationship among features yet is prone to over-fit the training data. Late fusion deals with the over-fitting problem better but does not allow classifiers to train on all the data at the same time. In this paper, we introduce a fusion scheme named double fusion, which simply combines early fusion and late fusion together to incorporate their advantages. Results are reported on the TRECVID MED 2010, MED 2011, UCF50 and HMDB51 datasets. For the MED 2010 dataset, we get a mean minimal normalized detection cost (MMNDC) of 0.49, which exceeds the state-of-the-art performance by more than 12 percent. On the TRECVID MED 2011 test dataset, we achieve a MMNDC of 0.51, which is the second best among all 19 participants. On UCF50 and HMDB51, we obtain classification accuracy of 88.1 % and 48.7 % respectively, which are the best reported results to date.
更多
查看译文
关键词
Feature combination,Early fusion,Late fusion,Double fusion,Multimedia event detection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要