Recognize Complex Events From Static Images By Fusing Deep Channels

2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)(2015)

引用 162|浏览105
暂无评分
摘要
A considerable portion of web images capture events that occur in our personal lives or social activities. In this paper, we aim to develop an effective method for recognizing events from such images. Despite the sheer amount of study on event recognition, most existing methods rely on videos and are not directly applicable to this task. Generally, events are complex phenomena that involve interactions among people and objects, and therefore analysis of event photos requires techniques that can go beyond recognizing individual objects and carry out joint reasoning based on evidences of multiple aspects. Inspired by the recent success of deep learning, we formulate a multi-layer framework to tackle this problem, which takes into account both visual appearance and the interactions among humans and objects, and combines them via semantic fusion. An important issue arising here is that humans and objects discovered by detectors are in the form of bounding boxes, and there is no straightforward way to represent their interactions and incorporate them with a deep network. We address this using a novel strategy that projects the detected instances onto multi-scale spatial maps. On a large dataset with 60, 000 images, the proposed method achieved substantial improvement over the state-of-the-art, raising the accuracy of event recognition by over 10%.
更多
查看译文
关键词
complex event recognition,static images,deep channel fusion,Web images,social activities,personal lives,event photos,joint reasoning,deep learning,multilayer framework,semantic fusion,bounding boxes,deep network,multiscale spatial maps
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要