SUN database: Large-scale scene recognition from abbey to zoo

    CVPR, pp. 3485-3492, 2010.

    Cited by: 1949|Bibtex|Views29|Links
    EI WOS
    Keywords:
    histogramssundatabasesanthropometryaccuracyMore(7+)
    Wei bo:
    We hope that the Scene UNderstanding database will help the community advance the state of scene understanding

    Abstract:

    Scene categorization is a fundamental problem in computer vision. However, scene understanding research has been constrained by the limited scope of currently-used databases which do not capture the full variety of scene categories. Whereas standard databases for object categorization contain hundreds of different classes of objects, the ...More

    Code:

    Data:

    0
    Introduction
    • Whereas the fields of computer vision and cognitive science have developed several databases to organize knowledge about object categories [10, 28], a comprehensive database of real world scenes does not currently exist.
    • Nature apple orchard arbor archipelago brewery cathedral bookbindery bowling campus carport crag cromlech ditch departure lounge fly bridge dais boat deck house fire escape floating bridge hatchway jewelleryshop hunting lodge launchpad loading dock lookout station glen marsh gorge grassland mineshaft mountain parlor pilothouse police office piazza plantation porch rice paddy river rock outcrop skating rink sports stadium staircase shelter signal box skyscraper snowbank stream sunken garden tions and behaviors, such as eating in a restaurant, drinking in a pub, reading in a library, and sleeping in a bedroom
    • Scenes, and their associated functions, are closely related to the visual features that structure the space.
    • The function of environments can be defined by their shape and size, by their constituent materials, or by embedded objects
    Highlights
    • Whereas the fields of computer vision and cognitive science have developed several databases to organize knowledge about object categories [10, 28], a comprehensive database of real world scenes does not currently exist
    • Rather than collect all scenes that humans experience – many of which are accidental views such as the corner of an office or edge of a door – we identify all the scenes and places that are important enough to have unique identities in discourse, and build the most complete dataset of scene image categories to date
    • In the previous section we provide a rough estimate on the number of common scene types that exist in the visual world and built the extensive Scene UNderstanding database to cover as many of those scenes as possible
    • We explore how discriminable the Scene UNderstanding categories are with a variety of image features and kernels paired with 1 vs. all support vector machines
    • For the experiments with our Scene UNderstanding database, the performance of all features enumerated above is compared in Figure 4(b)
    • We hope that the Scene UNderstanding database will help the community advance the state of scene understanding
    Methods
    • Experiments and Analysis

      With the features and kernels defined above, the authors train classifiers with one-vs-all Support Vector Machines.
    • For the experiments with the SUN database, the performance of all features enumerated above is compared in Figure 4(b).
    • The “all features” classifier is built from a weighted sum of the kernels of the individual features.
    • It is interesting to notice that with increasing amounts of training data, the performance increase is more pronounced with the SUN dataset than the 15 scene dataset.
    • If the authors instead use more of the SUN database to train the detectors (200 exemplars per class), the authors get the performance shown in Table 1
    Results
    • If the authors instead focus on the “good workers” who performed at least 100 HITs and have accuracy greater than 95% on the relatively easy first level of the hierarchy the leaf-level accuracy rises to 68.5%.
    • More than 90% of the sub-images in the test set have a valid scene classification
    Conclusion
    • To advance the field of scene understanding the authors need datasets that encompass the richness and varieties of environmental scenes and knowledge about how scene categories are organized and distinguished from each other.
    • The authors have proposed a quasi-exhaustive dataset of scene categories (899 environments).
    • Using state-of-theart algorithms for image classification, the authors have achieved new performance bounds for scene classification.
    • The authors hope that the SUN database will help the community advance the state of scene understanding.
    • The authors introduced a new task of scene detection within images
    Summary
    • Introduction:

      Whereas the fields of computer vision and cognitive science have developed several databases to organize knowledge about object categories [10, 28], a comprehensive database of real world scenes does not currently exist.
    • Nature apple orchard arbor archipelago brewery cathedral bookbindery bowling campus carport crag cromlech ditch departure lounge fly bridge dais boat deck house fire escape floating bridge hatchway jewelleryshop hunting lodge launchpad loading dock lookout station glen marsh gorge grassland mineshaft mountain parlor pilothouse police office piazza plantation porch rice paddy river rock outcrop skating rink sports stadium staircase shelter signal box skyscraper snowbank stream sunken garden tions and behaviors, such as eating in a restaurant, drinking in a pub, reading in a library, and sleeping in a bedroom
    • Scenes, and their associated functions, are closely related to the visual features that structure the space.
    • The function of environments can be defined by their shape and size, by their constituent materials, or by embedded objects
    • Methods:

      Experiments and Analysis

      With the features and kernels defined above, the authors train classifiers with one-vs-all Support Vector Machines.
    • For the experiments with the SUN database, the performance of all features enumerated above is compared in Figure 4(b).
    • The “all features” classifier is built from a weighted sum of the kernels of the individual features.
    • It is interesting to notice that with increasing amounts of training data, the performance increase is more pronounced with the SUN dataset than the 15 scene dataset.
    • If the authors instead use more of the SUN database to train the detectors (200 exemplars per class), the authors get the performance shown in Table 1
    • Results:

      If the authors instead focus on the “good workers” who performed at least 100 HITs and have accuracy greater than 95% on the relatively easy first level of the hierarchy the leaf-level accuracy rises to 68.5%.
    • More than 90% of the sub-images in the test set have a valid scene classification
    • Conclusion:

      To advance the field of scene understanding the authors need datasets that encompass the richness and varieties of environmental scenes and knowledge about how scene categories are organized and distinguished from each other.
    • The authors have proposed a quasi-exhaustive dataset of scene categories (899 environments).
    • Using state-of-theart algorithms for image classification, the authors have achieved new performance bounds for scene classification.
    • The authors hope that the SUN database will help the community advance the state of scene understanding.
    • The authors introduced a new task of scene detection within images
    Tables
    • Table1: Scene Detection Average Precision. We compare the scene detection performance of our algorithm using all features and 200 training examples per class to baselines using only the “tiny images” feature and random guessing. “Sky”, “Forest”, and “Building Facade” make up a large portion of the test set and thus random guessing can achieve significant AP
    Download tables as Excel
    Funding
    • This work is funded by NSF CAREER Awards 0546262 to A.O, 0747120 to A.T. and partly funded by BAE Systems under Subcontract No 073692 (Prime Contract No HR0011-08-C-0134 issued by DARPA), Foxconn and gifts from Google and Microsoft
    • K.A.E is funded by a NSF Graduate Research fellowship
    Reference
    • E. H. Adelson. On seeing stuff: The perception of materials by humans. Proceedings of the SPIE, (4299), 2001.
      Google ScholarLocate open access versionFindings
    • T. Ahonen, J. Matas, C. He, and M. Pietikainen. Rotation invariant image description with local binary pattern histogram fourier features. In SCIA, 2009.
      Google ScholarLocate open access versionFindings
    • K. Barnard, P. Duygulu, N. de Freitas, D. Forsyth, D. Blei, and M. Jordan. Matching words and pictures. J. of Machine Learning Research, 3:1107–1135, Feb. 2003.
      Google ScholarLocate open access versionFindings
    • N. Dalal and B. Triggs. Histogram of oriented gradient object detection. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2005.
      Google ScholarLocate open access versionFindings
    • R. Epstein and N. Kanwisher. A cortical representation of the local visual environment. Nature, 392:598–601, 1998.
      Google ScholarLocate open access versionFindings
    • M. Everingham, L. V. Gool, C. K. I. Williams, J. W. ands, and A. Zisserman. The pascal visual object classes (voc) challenge. Intl. J. Computer Vision, September 2009.
      Google ScholarLocate open access versionFindings
    • L. Fei-Fei and P. Perona. A bayesian hierarchical model for learning natural scene categories. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, pages 524–531, 2005.
      Google ScholarLocate open access versionFindings
    • C. Fellbaum. Wordnet: An Electronic Lexical Database. Bradford Books, 1998.
      Google ScholarFindings
    • P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models. IEEE Trans. on Pattern Analysis and Machine Intelligence.
      Google ScholarLocate open access versionFindings
    • G. Griffin, A. Holub, and P. Perona. Caltech-256 object category dataset. Technical Report 7694, California Institute of Technology, 2007.
      Google ScholarFindings
    • J. Hays and A. A. Efros. im2gps: estimating geographic information from a single image. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
      Google ScholarLocate open access versionFindings
    • D. Hoiem, A. Efros, and M. Hebert. Geometric context from a single image. In Proc. IEEE Intl. Conf. on Computer Vision, 2005.
      Google ScholarLocate open access versionFindings
    • D. Hoiem, A. Efros, and M. Hebert. Recovering surface layout from an image. Intl. J. Computer Vision, 75(1), 2007.
      Google ScholarLocate open access versionFindings
    • P. Jolicoeur, M. Gluck, and S. Kosslyn. Pictures and names: Making the connection. Cognitive Psychology, 16:243–275, 1984.
      Google ScholarLocate open access versionFindings
    • J. Kosecka and W. Zhang. Video compass. In Proc. European Conf. on Computer Vision, pages 476–490, 2002.
      Google ScholarLocate open access versionFindings
    • J.-F. Lalonde, D. Hoiem, A. A. Efros, C. Rother, J. Winn, and A. Criminisi. Photo clip art. ACM Transactions on Graphics (SIGGRAPH), 26(3), 2007.
      Google ScholarLocate open access versionFindings
    • S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, pages 2169–2178, 2006.
      Google ScholarLocate open access versionFindings
    • D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proc. IEEE Intl. Conf. on Computer Vision, 2001.
      Google ScholarLocate open access versionFindings
    • J. Matas, O. Chum, M. Urban, and T. Pajdla. Robust widebaseline stereo from maximally stable extremal regions. Image and Vision Computing, 22(10):761 – 767, 2004.
      Google ScholarLocate open access versionFindings
    • T. Ojala, M. Pietikainen, and T. Maenpaa. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. on Pattern Analysis and Machine Intelligence, 24(7):971–987, 2002.
      Google ScholarLocate open access versionFindings
    • A. Oliva and A. Torralba. Modeling the shape of the scene: a holistic representation of the spatial envelope. Intl. J. Computer Vision, 42:145–175, 2001.
      Google ScholarLocate open access versionFindings
    • J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
      Google ScholarLocate open access versionFindings
    • L. Renninger and J. Malik. When is scene recognition just texture recognition? Vision Research, 44:2301–2311, 2004.
      Google ScholarLocate open access versionFindings
    • E. Rosch. Natural categories. Cognitive Psychology, 4:328– 350, 1973.
      Google ScholarLocate open access versionFindings
    • E. Rosch, C. Mervis, W. Gray, D. Johnson, and P. BoyesBraem. Basic objects in natural categories. Cognitive Psychology, 8:382–439, 1976.
      Google ScholarLocate open access versionFindings
    • E. Shechtman and M. Irani. Matching local self-similarities across images and videos. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
      Google ScholarLocate open access versionFindings
    • J. Sivic and A. Zisserman. Video data mining using configurations of viewpoint invariant regions. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2004.
      Google ScholarLocate open access versionFindings
    • A. Torralba, R. Fergus, and W. T. Freeman. 80 million tiny images: a large database for non-parametric object and scene recognition. IEEE Trans. on Pattern Analysis and Machine Intelligence, 30(11):1958–1970, November 2008.
      Google ScholarLocate open access versionFindings
    • A. Torralba, K. Murphy, W. Freeman, and M. Rubin. Context-based vision system for place and object recognition. In Proc. IEEE Intl. Conf. on Computer Vision, 2003.
      Google ScholarLocate open access versionFindings
    • B. Tversky and K. Hemenway. Categories of environmental scenes. Cognitive Psychology, 15:121–149, 1983.
      Google ScholarLocate open access versionFindings
    • J. Vogel and B. Schiele. A semantic typicality measure for natural scene categorization. In German Symposium on Pattern Recognition DAGM, 2004.
      Google ScholarLocate open access versionFindings
    • J. Vogel and B. Schiele. Semantic model of natural scenes for content-based image retrieval. Intl. J. Computer Vision, 72:133–157, 2007.
      Google ScholarLocate open access versionFindings
    Your rating :
    0

     

    Tags
    Comments