AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
As we try to tackle higher level visual recognition problems, we show that Object Bank representation is powerful on scene classification tasks because it carries rich semantic level image information

Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification.

NIPS, pp.1378-1386, (2010)

Cited by: 1067|Views211
EI
Full Text
Bibtex
Weibo

Abstract

Robust low-level image features have been proven to be effective representations for a variety of visual recognition tasks such as object recognition and scene classification; but pixels, or even local image patches, carry little semantic meanings. For high level visual tasks, such low-level image representations are potentially not enoug...More

Code:

Data:

0
Introduction
  • Understanding the meanings and contents of images remains one of the most challenging problems in machine intelligence and statistical learning.
  • Fig.1 illustrates the gradient-based GIST features [25] and texture-based Spatial Pyramid representation [19] of two different scenes.
  • Such schemes often fail to offer sufficient discriminative power, as one can see from the very similar image statistics in the examples in Fig.1.
Highlights
  • Understanding the meanings and contents of images remains one of the most challenging problems in machine intelligence and statistical learning
  • We show that an image representation based on objects can be very useful in high-level visual recognition tasks for scenes cluttered with objects
  • We propose a regularized logistic regression method, akin to the group lasso approach for structured sparsity, to explore both feature sparsity and object sparsity in the Object Bank representation for learning and classifying complex scenes
  • Dataset We evaluate the Object Bank representation on 4 scene datasets, ranging from generic natural scene images (15-Scene, LabelMe 9-class scene dataset3), to cluttered indoor images (MIT Indoor Scene), and to complex event and activity images (UIUC-Sports)
  • We investigate the behaviors of different structural risk minimization schemes over logistic regression on the Object Bank representation
  • As we try to tackle higher level visual recognition problems, we show that Object Bank representation is powerful on scene classification tasks because it carries rich semantic level image information
Results
  • Dataset The authors evaluate the OB representation on 4 scene datasets, ranging from generic natural scene images (15-Scene, LabelMe 9-class scene dataset3), to cluttered indoor images (MIT Indoor Scene), and to complex event and activity images (UIUC-Sports).
  • Experiment Setup The authors compare OB in scene classification tasks with different types of conventional image features, such as SIFT-BoW [23, 3], GIST [25] and SPM [19].
  • An off-the-shelf SVM classifier, and an in-house implementation of the logistic regression (LR) classifier were used on all feature representations being compared.
  • As introduced in Sec 4, the authors experimented 1 regularized LR (LR1), 1/ 2 regularized LR (LRG) and 1/ 2 + 1 regularized LR (LRG1)
Conclusion
  • As the authors try to tackle higher level visual recognition problems, the authors show that Object Bank representation is powerful on scene classification tasks because it carries rich semantic level image information.
  • The authors apply structured regularization schemes on the OB representation, and achieve nearly lossless semantic-preserving compression.
  • The authors will further test OB representation in other useful vision applications, as well as other interesting structural regularization schemes
Tables
  • Table1: Comparison of classification results using OB with reported state-of-theart algorithms. Many of the algorithms use more complex model and supervised information, whereas our results are obtained by applying simple logistic regression
Download tables as Excel
Related work
  • A plethora of image descriptors have been developed for object recognition and image classification [25, 1, 23]. We particularly draw the analogy between our object bank and the texture filter banks [26, 10].

    Object detection and recognition also entail a large body of literature [7]. In this work, we mainly use the current state-of-the-art object detectors of Felzenszwalb et. al. [9], as well as the geometric context classifiers (“stuff” detectors) of Hoeim et. al. [13] for pre-training the object detectors.

    The idea of using object detectors as the basic representation of images is analogous [12, 33, 35]. In contrast to our work, in [12] and [33] each semantic concept is trained by using the entire images or frames of video. As there is no localization of object concepts in scenes, understanding cluttered images composed of many objects will be challenging. In [35], a small number of concepts are trained and only the most probable concept is used to form the representation for each region, whereas in our approach all the detector responses are used to encode richer semantic information.
Funding
  • F-F is partially supported by an NSF CAREER grant (IIS-0845230), a Google research award, and a Microsoft Research Fellowship
  • X is supported by AFOSR FA9550010247, ONR N0001140910758, NSF Career DBI-0546594, NSF IIS- 0713379 and Alfred P
Reference
  • S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition using shape contexts. IEEE PAMI, pages 509–522, 2002.
    Google ScholarLocate open access versionFindings
  • L. Bourdev and J. Malik. Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations. ICCV, 2009.
    Google ScholarLocate open access versionFindings
  • G. Csurka, C. Bray, C. Dance, and L. Fan. Visual categorization with bags of keypoints. Workshop on Statistical Learning in Computer Vision, ECCV, 2004.
    Google ScholarLocate open access versionFindings
  • N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. CVPR, 2005.
    Google ScholarLocate open access versionFindings
  • J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. CVPR, 2009.
    Google ScholarLocate open access versionFindings
  • L. Fei-Fei, R. Fergus, and P. Perona. One-Shot learning of object categories. TPAMI, 2006.
    Google ScholarLocate open access versionFindings
  • L. Fei-Fei, R. Fergus, and A. Torralba. Recognizing and learning object categories. Short Course CVPR [8] A. Farhadi, I. Endres, D. Hoiem and D. Forsyth. Describing objects by their attributes. CVPR, 2009.
    Google ScholarLocate open access versionFindings
  • [9] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object Detection with Discriminatively Trained Part Based Models. JAIR, 29, 2007.
    Google ScholarLocate open access versionFindings
  • [10] W.T. Freeman and E.H. Adelson. The design and use of steerable filters. IEEE PAMI, 1991.
    Google ScholarLocate open access versionFindings
  • [11] G. Griffin, A. Holub, and P. Perona. Caltech-256 Object Category Dataset. 2007.
    Google ScholarFindings
  • [12] A. Hauptmann, R. Yan, W. Lin, M. Christel, and H. Wactlar. Can high-level concepts fill the semantic gap in video retrieval? a case study with broadcast news. IEEE TMM, 9(5):958, 2007.
    Google ScholarLocate open access versionFindings
  • [13] D. Hoiem, A.A. Efros, and M. Hebert. Automatic photo pop-up. SIGGRAPH 2005, 24(3):577–584, 2005.
    Google ScholarLocate open access versionFindings
  • [14] D. Hoiem, A.A. Efros, and M. Hebert. Putting Objects in Perspective. CVPR, 2006.
    Google ScholarLocate open access versionFindings
  • [15] T. Kadir and M. Brady. Scale, saliency and image description. IJCV, 45(2):83–105, 2001.
    Google ScholarLocate open access versionFindings
  • [16] N. Kumar, A. C. Berg, P. N. Belhumeur and S. K. Nayar. Attribute and Simile Classifiers for Face Verification. ICCV, 2009.
    Google ScholarLocate open access versionFindings
  • [17] C.H. Lampert, H. Nickisch and S. Harmeling. Learning to detect unseen object classes by between-class attribute transfer. CVPR, 2009.
    Google ScholarLocate open access versionFindings
  • [18] C.H. Lampert, M.B. Blaschko, T. Hofmann, and S. Zurich. Beyond sliding windows: Object localization by efficient subwindow search. CVPR, 2008.
    Google ScholarLocate open access versionFindings
  • [19] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. CVPR, 2006.
    Google ScholarLocate open access versionFindings
  • [20] H.Lee, R.Grosse, R.Ranganath and A. Y. Ng. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. ICML, 2009.
    Google ScholarLocate open access versionFindings
  • [21] D.Lewis. Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval. ECML, 1998.
    Google ScholarLocate open access versionFindings
  • [22] L-J. Li and L. Fei-Fei. What, where and who? classifying events by scene and object recognition. ICCV, 2007.
    Google ScholarLocate open access versionFindings
  • [23] D. Lowe. Object recognition from local scale-invariant features. ICCV, 1999.
    Google ScholarLocate open access versionFindings
  • [24] K. Mikolajczyk and C. Schmid. An affine invariant interest point detector. ECCV, 2002.
    Google ScholarLocate open access versionFindings
  • [25] A. Oliva and A. Torralba. Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV, 42, 2001.
    Google ScholarLocate open access versionFindings
  • [26] P. Perona and J. Malik. Scale-space and edge detection using anisotropic diffusion. PAMI, 1990.
    Google ScholarLocate open access versionFindings
  • [27] A. Quattoni and A. Torralba. Recognizing indoor scenes. CVPR, 2009.
    Google ScholarLocate open access versionFindings
  • [28] A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora and S. Belongie. Objects in context. ICCV, 2007.
    Google ScholarLocate open access versionFindings
  • [29] D. Ramanan C. Desai and C. Fowlkes. Discriminative models for multi-class object layout. ICCV, 2009.
    Google ScholarLocate open access versionFindings
  • [30] B.C. Russell, A. Torralba, K.P. Murphy, and W.T. Freeman. Labelme: a database and web-based tool for image annotation. MIT AI Lab Memo, 2005.
    Google ScholarFindings
  • [31] L. Von Ahn. Games with a purpose. Computer, 39(6):92–94, 2006.
    Google ScholarLocate open access versionFindings
  • [32] C. Wang, D. Blei, and L. Fei-Fei. Simultaneous image classification and annotation. CVPR, 2009.
    Google ScholarLocate open access versionFindings
  • [33] L. Torresani, M. Szummer, and A. Fitzgibbon. Efficient Object Category Recognition Using Classemes. European Conference of Computer Vision 2010, pages 776–789, 2010.
    Google ScholarLocate open access versionFindings
  • Logistic Regression. Annals of Statistics, 2009.
    Google ScholarLocate open access versionFindings
  • [35] J. Vogel and B. Schiele. Semantic modeling of natural scenes for content-based image retrieval. International Journal of Computer Vision, 2007.
    Google ScholarLocate open access versionFindings
  • [36] M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2006.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科