Predicting Facial Attributes in Video Using Temporal Coherence and Motion-Attention

2018 IEEE Winter Conference on Applications of Computer Vision (WACV)(2018)

引用 9|浏览74
Recent research progress in facial attribute recognition has been dominated by small improvements on the only large-scale publicly available benchmark dataset, CelebA [18]. We propose to extend attribute prediction research to unconstrained videos. Applying attribute models trained on CelebA - a still image dataset - to video data highlights several major problems with current models, including the lack of consideration for both time and motion. Many facial attributes (e.g. gender, hair color) should be consistent throughout a video, however, current models do not produce consistent results. We introduce two methods to increase the consistency and accuracy of attribute responses in videos: a temporal coherence constraint, and a motionattention mechanism. Both methods work on weakly labeled data, requiring attribute labels for only one frame in a sequence, which we call the anchor frame. The temporal coherence constraint moves the network responses of non-anchor frames toward the responses of anchor frames for each sequence, resulting in more stable and accurate attribute predictions. We use the motion between anchor and non-anchor video frames as an attention mechanism, discarding the information from parts of the non-anchor frame where no motion occurred. This motion-attention focuses the network on the moving parts of the non-anchor frames (i.e. the face). Since there is no large-scale video dataset labeled with attributes, it is essential for attribute models to be able to learn from weakly labeled data. We demonstrate the effectiveness of the proposed methods by evaluating them on the challenging YouTube Faces video dataset [31]. The proposed motion-attention and temporal coherence methods outperform attribute models trained on CelebA, as well as those fine-tuned on video data. To the best of our knowledge, this paper is the first to address the problem of facial attribute prediction in video.
motion-attention,facial attribute recognition,large-scale publicly available benchmark dataset,attribute prediction research,unconstrained videos,attribute models,image dataset,attribute responses,temporal coherence constraint,weakly labeled data,attribute labels,anchor frame,nonanchor frame,stable attribute predictions,accurate attribute predictions,attention mechanism,large-scale video dataset,challenging YouTube Faces video,temporal coherence methods,facial attribute prediction,CelebA
AI 理解论文
Chat Paper