Columbia-IBM news video story segmentation in trecvid 2004


引用 39|浏览42
ABSTRACT In this technical report, we give an overview of our technical developments in the story segmentation task in TRECVID 2004. Among them, we propose an information-theoretic framework, visual cue cluster construction (VC,), to auto- matically discover adequate mid-level features. The prob- lem is posed as mutual information maximization, through which optimal cue clusters are discovered to preserve the highest information about the semantic labels. We extend the Information Bottleneck framework to high-dimensional continuous features and further propose a projection method to map each video into probabilistic memberships,over all the cue clusters. The biggest advantage of the proposed approach is to remove the dependence,on the manual,pro- cess in choosing the mid-level features and the huge labor cost involved in annotating the training corpus for train- ing the detector of each mid-level feature. When tested in TRECVID 2004 news video story segmentation, the pro- posed approach achieves promising performance,gain over representations derived from conventional clustering tech- niques and even the mid-level features selected manually; meanwhile, it achieved one of the top performances, F1=0.65, close to the highest performance, F1=0.69, by other groups. We also experiment,with other promising,visual features and continue investigating effective prosody features. The introduction of post-processing also provides practical im- provements. Furthermore, the fusion from other modalities, such as speech prosody features and ASR-based segmenta- tion scores are signicant and have been conrmed,again in
feature selection,information bottleneck,projection method,visual cues,mutual information
AI 理解论文
Chat Paper