Large-Scale Video Classification with Convolutional Neural Networks

Andrej Karpathy,George Toderici,Sanketh Shetty,Thomas Leung,Rahul Sukthankar,Li Fei-Fei

Computer Vision and Pattern Recognition（2014）

引用 8357|浏览2581

暂无评分

摘要

Convolutional Neural Networks (CNNs) have been established as a powerful class of models for image recognition problems. Encouraged by these results, we provide an extensive empirical evaluation of CNNs on large-scale video classification using a new dataset of 1 million YouTube videos belonging to 487 classes. We study multiple approaches for extending the connectivity of a CNN in time domain to take advantage of local spatio-temporal information and suggest a multiresolution, foveated architecture as a promising way of speeding up the training. Our best spatio-temporal networks display significant performance improvements compared to strong feature-based baselines (55.3% to 63.9%), but only a surprisingly modest improvement compared to single-frame models (59.3% to 60.9%). We further study the generalization performance of our best model by retraining the top layers on the UCF-101 Action Recognition dataset and observe significant performance improvements compared to the UCF-101 baseline model (63.3% up from 43.9%).

查看译文

关键词

convolutional, neural, network, video, classification, action, recognition, large-scale, dataset, sports,feature extraction,convolutional neural networks,classification,computer architecture,image classification,spatial resolution,dataset,computational modeling,neural,network,neural nets,action,sports,video

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要