Improving scene attribute recognition using web-scale object detectors

Computer Vision and Image Understanding(2015)

引用 5|浏览54
暂无评分
摘要
Humans often describe scenes by their affordances, which are suggested by objects.Object detectors trained at the web scale can improve scene attribute recognition.We experiment on a semi-supervised continuous learner and a supervised deep network.Learned models capture intuitive and useful object-scene attribute relationships. Semantic attributes enable a richer description of scenes than basic category labels. While traditionally scenes have been analyzed using global image features such as Gist, recent studies suggest that humans often describe scenes in ways that are naturally characterized by local image evidence. For example, humans often describe scenes by their functions or affordances, which are largely suggested by the objects in the scene. In this paper, we leverage a large collection of modern object detectors trained at the web scale to derive effective high-level features for scene attribute recognition. We conduct experiments using two modern object detection frameworks: a semi-supervised learner that continuously learns object models from web images, and a state-of-the-art deep network. The detector response features improve the state of the art on the standard scene attribute benchmark by 5% average precision, and also capture intuitive object-scene relationships, such as the positive correlation of castles with \"vacationing/touring\" scenes.
更多
查看译文
关键词
Affordances,Scene understanding,Semantic attributes,Semantic features
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要