Large-scale video event classification using dynamic temporal pyramid matching of visual semantics.
ICIP(2013)
摘要
Video event classification and retrieval has recently emerged as a challenging research topic. In addition to the variation in appearance of visual content and the large scale of the collections to be analyzed, this domain presents new and unique challenges in the modeling of the explicit temporal structure and implicit temporal trends of content within the video events. In this study, we present a technique for video event classification that captures temporal information over semantics using a scalable and efficient modeling scheme. An architecture for partitioning videos into a linear temporal pyramid, using segments of equal length and segments determined by the patterns of the underlying data, is applied over a rich underlying semantic description at the frame level using a taxonomy of nearly 1000 concepts containing 500,000 training images. Forward model selection with data bagging is used to prune the space of temporal features and data for efficiency. The system is implemented in the Hadoop MapReduce environment for arbitrary scalability. Our method is applied to the TRECVID Multimedia Event Detection 2012 task. Results demonstrate a significant boost in performance of over 50%, in terms of mean average precision, compared to common max or average pooling, and 17.7% compared to more complex pooling strategies that ignore temporal content.
更多查看译文
关键词
image classification,image matching,video signal processing,Hadoop Map-Reduction,TRECVID Multimedia Event Detection,dynamic temporal pyramid matching,large-scale video event classification,linear temporal pyramid,semantic description,visual content,visual semantics,event,pyramid,semantics,temporal,video
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络