Wildlife recognition in nature documentaries with weak supervision from subtitles and external data.

Pattern Recognition Letters(2016)

引用 9|浏览20
暂无评分
摘要
We address the problem of recognizing animals in videos in the absence of visual demarcations.We propose a novel feature transformation exploiting two key properties of CNN activations.We iteratively adapt classifiers trained on an external dataset using subtitles in target dataset.Results improve significantly over text, vision and combined baselines without the adaptation. We propose a weakly supervised framework for domain adaptation in a multi-modal context for multi-label classification. This framework is applied to annotate objects such as animals in a target video with subtitles, in the absence of visual demarcators. We start from classifiers trained on external data (the source, in our setting - ImageNet), and iteratively adapt them to the target dataset using textual cues from the subtitles. Experiments on a challenging dataset of wildlife documentaries validate the framework, with a final F1 measure of approximately 70%, which significantly improves over the results of a state-of-the-art approach, that is, applying classifiers trained on ImageNet without adaptation. The methods proposed here take us a step closer to object recognition in the wild and automatic video indexing.
更多
查看译文
关键词
Wildlife recognition,Cross-modal alignment,Domain adaptation,Multi-label classification,Incremental learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要