3D ShapeNets: A deep representation for volumetric shapes

    IEEE Conference on Computer Vision and Pattern Recognition, 2015.

    Cited by: 1806|Bibtex|Views124|Links
    EI
    Keywords:
    3d shape representationdeep learningarea under precision-recall curvedepth mapactive object recognitionMore(8+)
    Wei bo:
    To study 3D shape representation for objects, we propose a convolutional deep belief network to represent a geometric 3D shape as a probability distribution of binary variables on a 3D voxel grid

    Abstract:

    3D shape is a crucial but heavily underutilized cue in today's computer vision systems, mostly due to the lack of a good generic shape representation. With the recent availability of inexpensive 2.5D depth sensors (e.g. Microsoft Kinect), it is becoming increasingly important to have a powerful 3D shape representation in the loop. Apart...More

    Code:

    Data:

    0
    Introduction
    • Since the establishment of computer vision as a field five decades ago, 3D geometric shape has been considered to be one of the most important cues in object recognition.
    • 3D shape is not used in any state-of-the-art recognition methods (e.g.
    • [11, 19]), mostly due to the lack of a good generic representation for 3D geometric shapes.
    • It is becoming increasingly important to have a strong 3D shape representation in modern computer vision systems
    Highlights
    • Since the establishment of computer vision as a field five decades ago, 3D geometric shape has been considered to be one of the most important cues in object recognition
    • We show that our model can recognize objects in single-view 2.5D depth images and hallucinate the missing parts of depth maps
    • To study 3D shape representation, we propose to represent a geometric 3D shape as a probability distribution of binary variables on a 3D voxel grid
    • We evaluate retrieval algorithms using two metrics: (1) mean area under precision-recall curve (AUC) for all the testing queries4; (2) mean average precision (MAP) where AP is defined as the average precision each time a positive sample is returned
    • We evaluate the accuracy by running our 3D ShapeNets model on the integration depth maps of both the first view and the selected second view
    • To study 3D shape representation for objects, we propose a convolutional deep belief network to represent a geometric 3D shape as a probability distribution of binary variables on a 3D voxel grid
    Methods
    • The authors choose 40 common object categories from ModelNet with 100 unique CAD models per category.
    • Figure 6 shows some shapes sampled from the trained model.
    • 3D Shape Classification and Retrieval.
    • Deep learning has been widely used as a feature extraction technique.
    • The authors are interested in how well the features learned from 3D ShapeNets compared with other state-of-the-art 3D mesh features.
    • The authors discriminatively finetune 3D ShapeNets by replacing the top layer with class labels and use the 5th layer as features.
    • The authors choose Light Field descriptor [8] (LFD, 4,700 dimensions) and Spherical Harmonic descriptor [18] (SPH, 544 dimensions), which performed best among all descriptors [28]
    Results
    • 40 Classes Results

      Spherical Harmonic Light Field

      Our 5th layer finetuned

      Precision Precision

      0.2 Spherical Harmonic Light Field

      0.1 Our 5th layer finetuned Recall

      Figure 7: 3D Mesh Retrieval.
    • The authors summarize the results in Table 1 and Figure 7
    • Since both of the baseline mesh features (LFD and SPH) are rotation invariant, from the performance the authors have achieved, the authors believe 3D ShapeNets must have learned this invariance during feature learning.
    • Despite using a significantly lower resolution mesh as compared to the baseline descriptors, 3D ShapeNets outperforms them by a large margin
    • This demonstrates that our 3D deep learning model can learn better features from 3D data automatically
    Conclusion
    • To study 3D shape representation for objects, the authors propose a convolutional deep belief network to represent a geometric 3D shape as a probability distribution of binary variables on a 3D voxel grid.
    • The authors' model can jointly recognize and reconstruct objects from a single-view 2.5D depth map.
    • To train this 3D deep learning model, the authors construct ModelNet, a large-scale 3D CAD model dataset.
    • The authors' model significantly outperforms existing approaches on a variety of recognition tasks, and it is a promising approach for next-best-view planning.
    • All source code and data set are available at the project website
    Summary
    • Introduction:

      Since the establishment of computer vision as a field five decades ago, 3D geometric shape has been considered to be one of the most important cues in object recognition.
    • 3D shape is not used in any state-of-the-art recognition methods (e.g.
    • [11, 19]), mostly due to the lack of a good generic representation for 3D geometric shapes.
    • It is becoming increasingly important to have a strong 3D shape representation in modern computer vision systems
    • Methods:

      The authors choose 40 common object categories from ModelNet with 100 unique CAD models per category.
    • Figure 6 shows some shapes sampled from the trained model.
    • 3D Shape Classification and Retrieval.
    • Deep learning has been widely used as a feature extraction technique.
    • The authors are interested in how well the features learned from 3D ShapeNets compared with other state-of-the-art 3D mesh features.
    • The authors discriminatively finetune 3D ShapeNets by replacing the top layer with class labels and use the 5th layer as features.
    • The authors choose Light Field descriptor [8] (LFD, 4,700 dimensions) and Spherical Harmonic descriptor [18] (SPH, 544 dimensions), which performed best among all descriptors [28]
    • Results:

      40 Classes Results

      Spherical Harmonic Light Field

      Our 5th layer finetuned

      Precision Precision

      0.2 Spherical Harmonic Light Field

      0.1 Our 5th layer finetuned Recall

      Figure 7: 3D Mesh Retrieval.
    • The authors summarize the results in Table 1 and Figure 7
    • Since both of the baseline mesh features (LFD and SPH) are rotation invariant, from the performance the authors have achieved, the authors believe 3D ShapeNets must have learned this invariance during feature learning.
    • Despite using a significantly lower resolution mesh as compared to the baseline descriptors, 3D ShapeNets outperforms them by a large margin
    • This demonstrates that our 3D deep learning model can learn better features from 3D data automatically
    • Conclusion:

      To study 3D shape representation for objects, the authors propose a convolutional deep belief network to represent a geometric 3D shape as a probability distribution of binary variables on a 3D voxel grid.
    • The authors' model can jointly recognize and reconstruct objects from a single-view 2.5D depth map.
    • To train this 3D deep learning model, the authors construct ModelNet, a large-scale 3D CAD model dataset.
    • The authors' model significantly outperforms existing approaches on a variety of recognition tasks, and it is a promising approach for next-best-view planning.
    • All source code and data set are available at the project website
    Tables
    • Table1: Shape Classification and Retrieval Results
    • Table2: Accuracy for View-based 2.5D Recognition on NYU dataset [<a class="ref-link" id="c23" href="#r23">23</a>]. The first five rows are algorithms that use only depth information. The last two rows are algorithms that also use color information. Our 3D ShapeNets as a generative model performs reasonably well as compared to the other methods. After discriminative fine-tuning, our method achieves the best performance by a large margin of over 10%
    • Table3: Comparison of Different Next-Best-View Selections Based on Recognition Accuracy from Two Views. Based on an algorithm’s choice, we obtain the actual depth map for the next view and recognize the objects using two views by our 3D ShapeNets to compute the accuracies
    Download tables as Excel
    Related work
    • There has been a large body of insightful research on analyzing 3D CAD model collections. Most of the works [12, 7, 17] use an assembly-based approach to build deformable part-based models. These methods are limited to a specific class of shapes with small variations, with surface correspondence being one of the key problems in such approaches. Since we are interested in shapes across a variety of objects with large variations and part annotation is tedious and expensive, assembly-based modeling can be rather cumbersome. For surface reconstruction of corrupted scanning input, most related works [26, 3] are largely based on smooth interpolation or extrapolation. These ap-
    Funding
    • This work is supported by gift funds from Intel Corporation and Project X grant to the Princeton Vision Group, and a hardware donation from NVIDIA Corporation
    • Z.W. is also partially supported by Hong Kong RGC Fellowship
    Reference
    • N. Atanasov, B. Sankaran, J. Le Ny, T. Koletschka, G. J. Pappas, and K. Daniilidis. Hypothesis testing framework for active object detection. In ICRA, 2013. 3
      Google ScholarFindings
    • N. Atanasov, B. Sankaran, J. L. Ny, G. J. Pappas, and K. Daniilidis. Nonmyopic view planning for active object detection. arXiv preprint arXiv:1309.5401, 2013. 3
      Findings
    • M. Attene. A lightweight approach to repairing digitized polygon meshes. The Visual Computer, 2010. 2
      Google ScholarFindings
    • P. J. Besl and N. D. McKay. Method for registration of 3-d shapes. In PAMI, 1992. 7
      Google ScholarLocate open access versionFindings
    • I. Biederman. Recognition-by-components: a theory of human image understanding. Psychological review, 1987. 1
      Google ScholarLocate open access versionFindings
    • F. G. Callari and F. P. Ferrie. Active object recognition: Looking for differences. IJCV, 2001. 3
      Google ScholarLocate open access versionFindings
    • S. Chaudhuri, E. Kalogerakis, L. Guibas, and V. Koltun. Probabilistic reasoning for assembly-based 3d modeling. In ACM Transactions on Graphics (TOG), 2011. 2
      Google ScholarLocate open access versionFindings
    • D.-Y. Chen, X.-P. Tian, Y.-T. Shen, and M. Ouhyoung. On visual similarity based 3d model retrieval. In Computer graphics forum, 2003. 6
      Google ScholarFindings
    • J. Denzler and C. M. Brown. Information theoretic sensor data selection for active object recognition and state estimation. PAMI, 2002. 3
      Google ScholarLocate open access versionFindings
    • S. M. A. Eslami, N. Heess, and J. Winn. The shape boltzmann machine: a strong model of object shape. In CVPR, 2012. 3
      Google ScholarLocate open access versionFindings
    • P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models. PAMI, 2010. 1
      Google ScholarLocate open access versionFindings
    • T. Funkhouser, M. Kazhdan, P. Shilane, P. Min, W. Kiefer, A. Tal, S. Rusinkiewicz, and D. Dobkin. Modeling by example. In ACM Transactions on Graphics (TOG), 2004. 2
      Google ScholarLocate open access versionFindings
    • S. Gupta, R. Girshick, P. Arbelaez, and J. Malik. Learning rich features from rgb-d images for object detection and segmentation. In ECCV. 2014. 3
      Google ScholarFindings
    • G. E. Hinton. Training products of experts by minimizing contrastive divergence. Neural computation, 2002. 4
      Google ScholarFindings
    • G. E. Hinton, S. Osindero, and Y.-W. Teh. A fast learning algorithm for deep belief nets. Neural computation, 2006. 3, 4
      Google ScholarLocate open access versionFindings
    • Z. Jia, Y.-J. Chang, and T. Chen. Active view selection for object and pose recognition. In ICCV Workshops, 2009. 3
      Google ScholarLocate open access versionFindings
    • E. Kalogerakis, S. Chaudhuri, D. Koller, and V. Koltun. A probabilistic model for component-based shape synthesis. ACM Transactions on Graphics (TOG), 2012. 2
      Google ScholarLocate open access versionFindings
    • M. Kazhdan, T. Funkhouser, and S. Rusinkiewicz. Rotation invariant spherical harmonic representation of 3d shape descriptors. In SGP, 2003. 6
      Google ScholarLocate open access versionFindings
    • A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012. 1, 4
      Google ScholarLocate open access versionFindings
    • H. Lee, C. Ekanadham, and A. Y. Ng. Sparse deep belief net model for visual area v2. In NIPS, 2007. 4
      Google ScholarLocate open access versionFindings
    • H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng. Unsupervised learning of hierarchical representations with convolutional deep belief networks. Communications of the ACM, 2011. 3
      Google ScholarLocate open access versionFindings
    • J. L. Mundy. Object recognition in the geometric era: A retrospective. In Toward category-level object recognition. 2006. 1
      Google ScholarFindings
    • P. K. Nathan Silberman, Derek Hoiem and R. Fergus. Indoor segmentation and support inference from rgbd images. In ECCV, 2012. 2, 7, 8
      Google ScholarLocate open access versionFindings
    • F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce. 3d object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints. IJCV, 2006. 1
      Google ScholarLocate open access versionFindings
    • W. Scott, G. Roth, and J.-F. Rivest. View planning for automated 3d object reconstruction inspection. ACM Computing Surveys, 2003. 2, 3
      Google ScholarLocate open access versionFindings
    • S. Shalom, A. Shamir, H. Zhang, and D. Cohen-Or. Cone carving for surface reconstruction. In ACM Transactions on Graphics (TOG), 2010. 2
      Google ScholarLocate open access versionFindings
    • C.-H. Shen, H. Fu, K. Chen, and S.-M. Hu. Structure recovery by part assembly. ACM Transactions on Graphics (TOG), 2012. 2, 3
      Google ScholarLocate open access versionFindings
    • P. Shilane, P. Min, M. Kazhdan, and T. Funkhouser. The princeton shape benchmark. In Shape Modeling Applications, 2004. 6
      Google ScholarLocate open access versionFindings
    • R. Socher, B. Huval, B. Bhat, C. D. Manning, and A. Y. Ng. Convolutional-recursive deep learning for 3d object classification. In NIPS. 2012. 3, 7, 8
      Google ScholarFindings
    • S. Song and J. Xiao. Sliding Shapes for 3D object detection in RGB-D images. In ECCV, 2014. 1
      Google ScholarLocate open access versionFindings
    • J. Tang, S. Miller, A. Singh, and P. Abbeel. A textured object recognition pipeline for color and depth image data. In ICRA, 2012. 1
      Google ScholarLocate open access versionFindings
    • T. Tieleman and G. Hinton. Using fast weights to improve persistent contrastive divergence. In ICML, 2009. 4
      Google ScholarLocate open access versionFindings
    • J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba. SUN database: Large-scale scene recognition from abbey to zoo. In CVPR, 2010. 6
      Google ScholarLocate open access versionFindings
    Your rating :
    0

     

    Tags
    Comments