Dynamic Graph CNN for Learning on Point Clouds

    ACM Transactions on Graphics, pp. 1-12, 2019.

    Cited by: 531|Bibtex|Views57|Links
    EI
    Keywords:
    Point cloudclassificationsegmentation
    Wei bo:
    Our Dynamic Graph CNN is related to two classes of approaches, PointNets and graph convolutional neural networks, which we show to be particular settings of our method

    Abstract:

    Point clouds provide a flexible geometric representation suitable for countless applications in computer graphics; they also comprise the raw output of most 3D data acquisition devices. While hand-designed features on point clouds have long been proposed in graphics and vision, however, the recent overwhelming success of convolutional neu...More

    Code:

    Data:

    0
    Introduction
    • Scattered collections of points in 2D or 3D, are arguably the simplest shape representation; they comprise the output of 3D sensing technology including LiDAR scanners and stereo reconstruction.
    • Rather than identifying salient geometric features like corners and edges, recent algorithms search for semantic cues and affordances.
    • These features do not fit cleanly into the frameworks of computational or differential geometry and typically require learning-based approaches that derive relevant information through statistical analysis of labeled or unlabeled datasets
    Highlights
    • Point clouds, or scattered collections of points in 2D or 3D, are arguably the simplest shape representation; they comprise the output of 3D sensing technology including LiDAR scanners and stereo reconstruction
    • Our Dynamic Graph CNN is related to two classes of approaches, PointNets and graph convolutional neural networks, which we show to be particular settings of our method
    • Data We evaluate our model on Stanford Large-Scale 3D Indoor Spaces Dataset (S3DIS) [1] for a semantic scene segmentation task
    • In this work we propose a new operator for learning on point cloud and show its performance on various tasks
    • The success of our technique verifies our hypothesis that local geometric features are crucial to 3D recognition tasks, even after introducing machinery from deep learning
    • We will consider applications of our techniques to more abstract point clouds coming from applications like document retrieval rather than 3D geometry; beyond broadening the applicability of our technique, these experiments will provide insight into the role of geometry in abstract data processing
    Methods
    • Voxelization is a straightforward way to convert unstructured geometric data to a regular 3D grid over which standard CNN operations can be applied [30, 54]
    • These volumetric representations are often wasteful, since voxelization produces a sparsely-occupied 3D grid.
    • The key ingredient is a symmetric function applied to 3D coordinates in a manner invariant to permutation
    • While they achieve impressive performance on point cloud analysis tasks, PointNets treat each point individually, essentially learning a mapping from 3D to the latent features without leveraging local geometric structure.
    • The learned mapping is sensitive to the global transformation of the point cloud; to cope with this issue, PointNet employs a complex and computationally expensive spatial transformer network [16] to learn 3D alignment
    Results
    • The authors' model achieves the best results on this dataset.
    • An advanced version including a local-aware network and dynamical graph recomputation achieves best results on this dataset.
    • MEAN CLASS ACCURACY OVERALL ACCURACY.
    • The mean IoU is calculated by averaging the IoUs of all the testing shapes.
    • The evaluation results are shown in Table 5.
    • The authors visually compare the results of the model and PointNet in Figure 10.
    • Qualitative results are shown in Figure 12, compared with ground truth.
    • The authors' model faithfully captures orientation even in the presence of fairly sharp features
    Conclusion
    • In this work the authors propose a new operator for learning on point cloud and show its performance on various tasks.
    • The success of the model suggests that intrinsic features can be valuable if not more than point coordinates; developing a practical and theoretically-justified framework for balancing intrinsic and extrinsic considerations in a learning pipeline will require insight from theory and practice in geometry processing
    • Another possible extension is to design a non-shared transformer network that works on each local patch differently, adding flexibility to the model.
    • The authors will consider applications of the techniques to more abstract point clouds coming from applications like document retrieval rather than 3D geometry; beyond broadening the applicability of the technique, these experiments will provide insight into the role of geometry in abstract data processing
    Summary
    • Introduction:

      Scattered collections of points in 2D or 3D, are arguably the simplest shape representation; they comprise the output of 3D sensing technology including LiDAR scanners and stereo reconstruction.
    • Rather than identifying salient geometric features like corners and edges, recent algorithms search for semantic cues and affordances.
    • These features do not fit cleanly into the frameworks of computational or differential geometry and typically require learning-based approaches that derive relevant information through statistical analysis of labeled or unlabeled datasets
    • Methods:

      Voxelization is a straightforward way to convert unstructured geometric data to a regular 3D grid over which standard CNN operations can be applied [30, 54]
    • These volumetric representations are often wasteful, since voxelization produces a sparsely-occupied 3D grid.
    • The key ingredient is a symmetric function applied to 3D coordinates in a manner invariant to permutation
    • While they achieve impressive performance on point cloud analysis tasks, PointNets treat each point individually, essentially learning a mapping from 3D to the latent features without leveraging local geometric structure.
    • The learned mapping is sensitive to the global transformation of the point cloud; to cope with this issue, PointNet employs a complex and computationally expensive spatial transformer network [16] to learn 3D alignment
    • Results:

      The authors' model achieves the best results on this dataset.
    • An advanced version including a local-aware network and dynamical graph recomputation achieves best results on this dataset.
    • MEAN CLASS ACCURACY OVERALL ACCURACY.
    • The mean IoU is calculated by averaging the IoUs of all the testing shapes.
    • The evaluation results are shown in Table 5.
    • The authors visually compare the results of the model and PointNet in Figure 10.
    • Qualitative results are shown in Figure 12, compared with ground truth.
    • The authors' model faithfully captures orientation even in the presence of fairly sharp features
    • Conclusion:

      In this work the authors propose a new operator for learning on point cloud and show its performance on various tasks.
    • The success of the model suggests that intrinsic features can be valuable if not more than point coordinates; developing a practical and theoretically-justified framework for balancing intrinsic and extrinsic considerations in a learning pipeline will require insight from theory and practice in geometry processing
    • Another possible extension is to design a non-shared transformer network that works on each local patch differently, adding flexibility to the model.
    • The authors will consider applications of the techniques to more abstract point clouds coming from applications like document retrieval rather than 3D geometry; beyond broadening the applicability of the technique, these experiments will provide insight into the role of geometry in abstract data processing
    Tables
    • Table1: Classification results on ModelNet40
    • Table2: Complexity, forward time and accuracy of different models
    • Table3: Effectiveness of different components. CENT denotes centralization, DYN denotes dynamical graph recomputation, and XFORM denotes the use of a spatial transformer
    • Table4: Results of our model with different numbers of nearest neighbors
    • Table5: Part segmentation results on ShapeNet part dataset. Metric is mIoU(%) on points
    Download tables as Excel
    Related work
    • Hand-Crafted Features Various tasks in geometric data processing and analysis — including segmentation, classification, and matching — require some notion of local similarity between shapes. Traditionally, this similarity is established by constructing feature descriptors that capture local geometric structure. Countless papers in computer vision and graphics propose local feature descriptors for point clouds suitable for different problems and data structures. A comprehensive overview of hand-designed point features is out of the scope of this paper, but we refer the reader to [51, 15, 4] for comprehensive discussion.

      Broadly speaking, one can distinguish between extrinsic and intrinsic descriptors. Extrinsic descriptors usually are derived from the coordinates of the shape in 3D space and includes classical methods like shape context [3], spin images [17], integral features [27], distance-based descriptors [24], point feature histograms [39, 38], and normal histograms [50], to name a few. Intrinsic descriptors treat the 3D shape as a manifold whose metric structure is discretized as a mesh or graph; quantities expressed in terms of the metric are by definition intrinsic and invariant to isometric deformation. Representatives of this class include spectral descriptors such as global point signatures [37], the heat and wave kernel signatures [48, 2], and variants [8]. Most recently, several approaches wrap machine learning schemes around standard descriptors [15, 42].
    Reference
    • I. Armeni, O. Sener, A. R. Zamir, H. Jiang, I. Brilakis, M. Fischer, and S. Savarese. 3d semantic parsing of large-scale indoor spaces. In Proc. CVPR, 2016.
      Google ScholarLocate open access versionFindings
    • M. Aubry, U. Schlickewei, and D. Cremers. The wave kernel signature: A quantum mechanical approach to shape analysis. In Proc. ICCV Workshops, 2011.
      Google ScholarLocate open access versionFindings
    • S. Belongie, J. Malik, and J. Puzicha. Shape context: A new descriptor for shape matching and object recognition. In Proc. NIPS, 2001.
      Google ScholarLocate open access versionFindings
    • S. Biasotti, A. Cerri, A. Bronstein, and M. Bronstein. Recent trends, applications, and perspectives in 3d shape similarity assessment. Computer Graphics Forum, 35(6):87–119, 2016.
      Google ScholarLocate open access versionFindings
    • D. Boscaini, J. Masci, S. Melzi, M. M. Bronstein, U. Castellani, and P. Vandergheynst. Learning classspecific descriptors for deformable shapes using localized spectral convolutional networks. Computer Graphics Forum, 34(5):13–23, 2015.
      Google ScholarLocate open access versionFindings
    • D. Boscaini, J. Masci, E. Rodola, and M. Bronstein. Learning shape correspondence with anisotropic convolutional neural networks. In Proc. NIPS, 2016.
      Google ScholarLocate open access versionFindings
    • M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst. Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine, 34(4):18–42, 2017.
      Google ScholarLocate open access versionFindings
    • M. M. Bronstein and I. Kokkinos. Scale-invariant heat kernel signatures for non-rigid shape recognition. In Proc. CVPR, 2010.
      Google ScholarLocate open access versionFindings
    • J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun. Spectral networks and locally connected networks on graphs. arXiv:1312.6203, 2013.
      Findings
    • A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, et al. Shapenet: An information-rich 3d model repository. arXiv:1512.03012, 2015.
      Findings
    • M. Defferrard, X. Bresson, and P. Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In Proc. NIPS, 2016.
      Google ScholarLocate open access versionFindings
    • F. Engelmann, T. Kontogianni, A. Hermans, and B. Leibe. Exploring spatial context for 3d semantic segmentation of point clouds. In Proc. CVPR, 2017.
      Google ScholarLocate open access versionFindings
    • D. Ezuz, J. Solomon, V. G. Kim, and M. Ben-Chen. Gwcnn: A metric alignment layer for deep shape analysis. Computer Graphics Forum, 36(5):49–57, 2017.
      Google ScholarLocate open access versionFindings
    • A. Golovinskiy, V. G. Kim, and T. Funkhouser. Shapebased recognition of 3d point clouds in urban environments. In Proc. ICCV, 2009.
      Google ScholarLocate open access versionFindings
    • Y. Guo, M. Bennamoun, F. Sohel, M. Lu, and J. Wan. 3d object recognition in cluttered scenes with local surface features: a survey. Trans. PAMI, 36(11):2270– 2287, 2014.
      Google ScholarLocate open access versionFindings
    • M. Jaderberg, K. Simonyan, A. Zisserman, et al. Spatial transformer networks. In Proc. NIPS, 2015.
      Google ScholarLocate open access versionFindings
    • A. E. Johnson and M. Hebert. Using spin images for efficient object recognition in cluttered 3D scenes. Trans. PAMI, 21(5):433–449, 1999.
      Google ScholarLocate open access versionFindings
    • D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. 2015.
      Google ScholarFindings
    • T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. 2017.
      Google ScholarFindings
    • R. Klokov and V. Lempitsky. Escape from cells: Deep kd-networks for the recognition of 3d point cloud models. 2017.
      Google ScholarFindings
    • A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Proc. NIPS, 2012.
      Google ScholarLocate open access versionFindings
    • Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4):541–551, 1989.
      Google ScholarLocate open access versionFindings
    • R. Levie, F. Monti, X. Bresson, and M. M. Bronstein. Cayleynets: Graph convolutional neural networks with complex rational spectral filters. arXiv:1705.07664, 2017.
      Findings
    • H. Ling and D. W. Jacobs. Shape classification using the inner-distance. Trans. PAMI, 29(2):286–299, 2007.
      Google ScholarLocate open access versionFindings
    • O. Litany, T. Remez, E. Rodola, A. M. Bronstein, and M. M. Bronstein. Deep functional maps: Structured prediction for dense shape correspondence. In Proc. ICCV, 2017.
      Google ScholarLocate open access versionFindings
    • M. Lu, Y. Guo, J. Zhang, Y. Ma, and Y. Lei. Recognizing objects in 3d point clouds with multi-scale local features. Sensors, 14(12):24156–24173, 2014.
      Google ScholarLocate open access versionFindings
    • S. Manay, D. Cremers, B.-W. Hong, A. J. Yezzi, and S. Soatto. Integral invariants for shape matching. Trans. PAMI, 28(10):1602–1618, 2006.
      Google ScholarLocate open access versionFindings
    • H. Maron, M. Galun, N. Aigerman, M. Trope, N. Dym, E. Yumer, V. G. Kim, and Y. Lipman. Convolutional neural networks on surfaces via seamless toric covers. In Proc. SIGGRAPH, 2017.
      Google ScholarLocate open access versionFindings
    • J. Masci, D. Boscaini, M. Bronstein, and P. Vandergheynst. Geodesic convolutional neural networks on riemannian manifolds. In Proc. 3dRR, 2015.
      Google ScholarLocate open access versionFindings
    • D. Maturana and S. Scherer. Voxnet: A 3d convolutional neural network for real-time object recognition. In Proc. IROS, 2015.
      Google ScholarLocate open access versionFindings
    • F. Monti, D. Boscaini, J. Masci, E. Rodola, J. Svoboda, and M. M. Bronstein. Geometric deep learning on graphs and manifolds using mixture model cnns. In Proc. CVPR, 2017.
      Google ScholarLocate open access versionFindings
    • M. Ovsjanikov, M. Ben-Chen, J. Solomon, A. Butscher, and L. Guibas. Functional maps: a flexible representation of maps between shapes. TOG, 31(4):30, 2012.
      Google ScholarLocate open access versionFindings
    • C. R. Qi, W. Liu, C. Wu, H. Su, and L. J. Guibas. Frustum pointnets for 3d object detection from rgb-d data. arXiv:1711.08488, 2017.
      Findings
    • C. R. Qi, H. Su, K. Mo, and L. J. Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proc. CVPR, 2017.
      Google ScholarLocate open access versionFindings
    • C. R. Qi, H. Su, M. Nießner, A. Dai, M. Yan, and L. J. Guibas. Volumetric and multi-view cnns for object classification on 3d data. In Proc. CVPR, 2016.
      Google ScholarLocate open access versionFindings
    • C. R. Qi, L. Yi, H. Su, and L. J. Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Proc. NIPS, 2017.
      Google ScholarLocate open access versionFindings
    • R. M. Rustamov. Laplace-beltrami eigenfunctions for deformation invariant shape representation. In Proc. SGP, 2007.
      Google ScholarLocate open access versionFindings
    • R. B. Rusu, N. Blodow, and M. Beetz. Fast point feature histograms (fpfh) for 3d registration. In Proc. ICRA, 2009.
      Google ScholarLocate open access versionFindings
    • R. B. Rusu, N. Blodow, Z. C. Marton, and M. Beetz. Aligning point cloud views using persistent feature histograms. In Proc. IROS, 2008.
      Google ScholarLocate open access versionFindings
    • R. B. Rusu, Z. C. Marton, N. Blodow, M. Dolha, and M. Beetz. Towards 3D Point Cloud Based Object Maps for Household Environments. Robotics and Autonomous Systems Journal, 56(11):927–941, 30
      Google ScholarLocate open access versionFindings
    • F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini. The graph neural network model. IEEE Tran. Neural Networks, 20(1):61–80, 2009.
      Google ScholarLocate open access versionFindings
    • S. A. A. Shah, M. Bennamoun, F. Boussaid, and A. A. El-Sallam. 3d-div: A novel local surface descriptor for feature matching and pairwise range image registration. In Proc. ICIP, 2013.
      Google ScholarLocate open access versionFindings
    • Y. Shen, C. Feng, Y. Yang, and D. Tian. Neighbors do help: Deeply exploiting local structures of point clouds. arXiv:1712.06760, 2017.
      Findings
    • D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Processing Magazine, 30(3):83– 98, 2013.
      Google ScholarLocate open access versionFindings
    • M. Simonovsky and N. Komodakis. Dynamic edgeconditioned filters in convolutional neural networks on graphs. In Proc. CVPR, 2017.
      Google ScholarLocate open access versionFindings
    • A. Sinha, J. Bai, and K. Ramani. Deep learning 3d shape surfaces using geometry images. In Proc. ECCV, 2016.
      Google ScholarLocate open access versionFindings
    • H. Su, S. Maji, E. Kalogerakis, and E. LearnedMiller. Multi-view convolutional neural networks for 3d shape recognition. In Proc. CVPR, 2015.
      Google ScholarLocate open access versionFindings
    • J. Sun, M. Ovsjanikov, and L. Guibas. A concise and provably informative multi-scale signature based on heat diffusion. Computer Graphics Forum, 28(5):1383–1392, 2009.
      Google ScholarLocate open access versionFindings
    • M. Tatarchenko, A. Dosovitskiy, and T. Brox. Octree generating networks: Efficient convolutional architectures for high-resolution 3d outputs. In Proc. ICCV, 2017.
      Google ScholarLocate open access versionFindings
    • F. Tombari, S. Salti, and L. Di Stefano. A combined texture-shape descriptor for enhanced 3d feature matching. In Proc. ICIP, 2011.
      Google ScholarLocate open access versionFindings
    • O. Van Kaick, H. Zhang, G. Hamarneh, and D. CohenOr. A survey on shape correspondence. Computer Graphics Forum, 30(6):1681–1707, 2011.
      Google ScholarLocate open access versionFindings
    • P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio. Graph attention networks. arXiv:1710.10903, 2017.
      Findings
    • L. Wei, Q. Huang, D. Ceylan, E. Vouga, and H. Li. Dense human body correspondences using convolutional networks. In Proc. CVPR, 2016.
      Google ScholarLocate open access versionFindings
    • Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao. 3d shapenets: A deep representation for volumetric shapes. In Proc. CVPR, 2015.
      Google ScholarLocate open access versionFindings
    • L. Yi, V. G. Kim, D. Ceylan, I. Shen, M. Yan, H. Su, A. Lu, Q. Huang, A. Sheffer, L. Guibas, et al. A scalable active framework for region annotation in 3d shape collections. TOG, 35(6):210, 2016.
      Google ScholarLocate open access versionFindings
    • L. Yi, H. Su, X. Guo, and L. Guibas. Syncspeccnn: Synchronized spectral cnn for 3d shape segmentation. In Proc. CVPR, 2017.
      Google ScholarLocate open access versionFindings
    • Y. Zhu, R. Mottaghi, E. Kolve, J. J. Lim, A. Gupta, L. Fei-Fei, and A. Farhadi. Target-driven visual navigation in indoor scenes using deep reinforcement learning. In Proc. ICRA, 2017.
      Google ScholarLocate open access versionFindings
    Your rating :
    0

     

    Tags
    Comments