Logistic Regression is Still Alive and Effective: The 3rd YouTube 8M Challenge Solution of the IVUL-KAUST team


引用 1|浏览4
In this report, we present our solution for the 3 YouTube-8M Video Understanding Challenge for a task of temporal localization of topics within a video. Our team achieves the 9 place in the Public Leaderboard and the 11 place in the Private Leaderboard with a difference of 4.5 × 10−4 from the 10 gold medal winner. Overall, we train 20 different models independently and use their ensemble to predict segment scores. Along with a video classifier, we generate final scores for each segment. We use one-loss or two-loss training strategies for different models to make full use of video-level annotations and segmentlevel annotations. Furthermore, we adopt a teacher-student model and deep clustering to generate pseudo-labels to increase the amount of fully-annotated data. 1. Brief problem description The task of the 3rd YouTube-8M Video Understanding Challenge is to predict a topic for a 5-second length segment within a video. Videos are annotated with over 3800 topics, but only 1000 are included in the final evaluation for segment-level predictions [1]. The training set only includes video-level annotations, whereas the validation set incorporates both video-level and segment-level annotations. Predictions are evaluated based on the Mean Average Precision (mAP) @ 100000, which is computed as follows
AI 理解论文
Chat Paper