Surveillance Video Analysis with External Knowledge and Internal Constraints


The automated analysis of video data becomes ever more important as we are inundated with the ocean of videos generated every day, thus leading to much research in tasks such as content-based video retrieval, pose estimation and surveillance video analysis. Current state-of-the-art algorithms in these tasks are mainly supervised, ie the algorithms learn models based on manually labeled training data. However, it is difficult to manually collect large quantities of high quality labeled data. Therefore, in this thesis, we propose to circumvent this problem by automatically harvesting and exploiting useful information from unlabeled video based on 1) out-of-domain external knowledge sources and 2) internal constraints in video. Two tasks in the surveillance domain were targeted: multi-object tracking and pose estimation.Being able to localize and identify each individual at each time instant would be extremely useful in surveillance video analysis. We tackled this challenge by formulating the problem as an identity-aware multi-object tracking problem. An existing out-ofdomain knowledge source: face recognition, and an internal constraint: the spatialtemporal smoothness constraint were used in a joint optimization framework to localize each person. The spatial-temporal smoothness constraint was further utilized to automatically collect large amounts of multi-view person re-identification training data. This data was utilized to train deep person re-identification networks which further enhanced tracking performance on our 23-day 15-camera data set which consists of 4,935 hours of video. Results show that our tracker has the ability to locate a person …
