Learning Representations and Generative Models for 3D Point Clouds

    ICML, pp. 40-49, 2018.

    Cited by: 102|Bibtex|Views16|Links
    EI
    Keywords:
    Generative Adversarial NetworksLatent-space GANmatchings3d point cloudgeometric datumMore(9+)
    Wei bo:
    Our extensive experiments show that the best generative model for point clouds is a Gaussian Mixture Models trained in the fixed latent space of an AE

    Abstract:

    Three-dimensional geometric data offer an excellent domain for studying representation learning and generative modeling. In this paper, we look at geometric data represented as point clouds. We introduce a deep autoencoder (AE) network with excellent reconstruction quality and generalization ability. The learned representations outperform...More

    Code:

    Data:

    0
    Introduction
    • Three-dimensional (3D) representations of real-life objects are a core tool for vision, robotics, medicine, augmented reality and virtual reality applications.
    • Recent classification work on point clouds (PointNet (Qi et al, 2016a)) bypasses this issue by avoiding convolutions involving groups of points
    • Another related issue with point clouds as a representation is that they are permutation invariant: any reordering of the rows of the point cloud matrix yields a point cloud that represents the same shape.
    • It creates the need for making the encoded feature permutation invariant
    Highlights
    • Three-dimensional (3D) representations of real-life objects are a core tool for vision, robotics, medicine, augmented reality and virtual reality applications
    • One workflow that we propose is to first train an AE to learn a latent representation and train a generative model in that fixed latent space
    • Qualitative results In Fig. 5, we show some synthetic results produced by our l-Generative Adversarial Networks and the 32-component Gaussian Mixture Models
    • We presented a novel set of architectures for 3D point cloud representation learning and generation
    • Our extensive experiments show that the best generative model for point clouds is a Gaussian Mixture Models trained in the fixed latent space of an AE
    • A thorough investigation on the conditions under which simple latent Gaussian Mixture Models are as powerful as adversarially trained models would be of significant interest
    Results
    • Evaluation Metrics for Generative Models

      An important component of this work is the introduction of measures that enable comparisons between two sets of points clouds A and B.
    • Evaluation Metrics for Generative Models.
    • An important component of this work is the introduction of measures that enable comparisons between two sets of points clouds A and B.
    • These metrics are useful for assessing the degree to which point clouds, synthesized or reconstructed, represent the same population as a held-out test set.
    • The authors' three measures are described below
    Conclusion
    • The complementary nature of MMD and Coverage directly follows from their definitions.
    • The authors' extensive experiments show that the best generative model for point clouds is a GMM trained in the fixed latent space of an AE.
    • While this might not be a universal result, it suggests that simple classic tools should not be dismissed.
    • A thorough investigation on the conditions under which simple latent GMMs are as powerful as adversarially trained models would be of significant interest
    Summary
    • Introduction:

      Three-dimensional (3D) representations of real-life objects are a core tool for vision, robotics, medicine, augmented reality and virtual reality applications.
    • Recent classification work on point clouds (PointNet (Qi et al, 2016a)) bypasses this issue by avoiding convolutions involving groups of points
    • Another related issue with point clouds as a representation is that they are permutation invariant: any reordering of the rows of the point cloud matrix yields a point cloud that represents the same shape.
    • It creates the need for making the encoded feature permutation invariant
    • Results:

      Evaluation Metrics for Generative Models

      An important component of this work is the introduction of measures that enable comparisons between two sets of points clouds A and B.
    • Evaluation Metrics for Generative Models.
    • An important component of this work is the introduction of measures that enable comparisons between two sets of points clouds A and B.
    • These metrics are useful for assessing the degree to which point clouds, synthesized or reconstructed, represent the same population as a held-out test set.
    • The authors' three measures are described below
    • Conclusion:

      The complementary nature of MMD and Coverage directly follows from their definitions.
    • The authors' extensive experiments show that the best generative model for point clouds is a GMM trained in the fixed latent space of an AE.
    • While this might not be a universal result, it suggests that simple classic tools should not be dismissed.
    • A thorough investigation on the conditions under which simple latent GMMs are as powerful as adversarially trained models would be of significant interest
    Tables
    • Table1: Generalization of AEs as captured by MMD. Measurements for reconstructions on the training and test splits for an AE trained with either the CD or EMD loss and data of the chair class; Note how the MMD favors the AE that was trained with the same loss as the one used by the MMD to make the matching
    • Table2: Classification performance (in %) on ModelNet10/40. Comparing to A: SPH (<a class="ref-link" id="cKazhdan_et+al_2003_a" href="#rKazhdan_et+al_2003_a">Kazhdan et al, 2003</a>), B: LFD (<a class="ref-link" id="cChen_et+al_2003_a" href="#rChen_et+al_2003_a">Chen et al, 2003</a>), C: T-L-Net (<a class="ref-link" id="cGirdhar_et+al_2016_a" href="#rGirdhar_et+al_2016_a">Girdhar et al, 2016</a>), D: VConv-DAE (<a class="ref-link" id="cSharma_et+al_2016_a" href="#rSharma_et+al_2016_a">Sharma et al, 2016</a>), E: 3D-GAN (<a class="ref-link" id="cWu_et+al_2016_a" href="#rWu_et+al_2016_a">Wu et al, 2016</a>)
    • Table3: Evaluating 5 generators on the test split of the chair dataset on epochs/models selected via minimal JSD on the validation-split. We report: A: sampling-based memorization baseline, B: r-GAN, C: l-GAN (AE-CD), D: l-GAN (AE-EMD) , E: l-WGAN (AEEMD), F: GMM (AE-EMD)
    • Table4: Fidelity (MMD-EMD) and coverage (COV-EMD) comparison between A: Wu et al (2016) and our GMM generative model on the test split of each class. Note that Wu et al uses all models of each class for training contrary to our generators
    • Table5: MMD and Coverage metrics evaluated on the output of voxel-based methods at resolution 643, matched against the chair test set, using the same protocol as in Table3. Comparing: A: “raw” 643-voxel GAN (<a class="ref-link" id="cWu_et+al_2016_a" href="#rWu_et+al_2016_a">Wu et al, 2016</a>) and a latent 643-voxel GMM
    • Table6: MMD-CD measurements for l-WGANs trained on the latent spaces of dedicated (left 5 columns) and multi-class EMDAEs (right column). Also shown is the weighted average of the per-class values, using the number of train (Tr) resp. test (Te) examples of each class as weights. All l-WGANs use the model parameter resulted by 2000 epochs of training
    Download tables as Excel
    Related work
    Funding
    • Last but not least, they wish to acknowledge the support of NSF grants NSF IIS-1528025 and DMS-1546206, ONR MURI grant N00014-13-1-0341, a Google Focused Research award, and a gift from Amazon Web Services for Machine Learning Research. Fan, H., Su, H., and Guibas, L
    Reference
    • Girdhar, R., Fouhey, D. F., Rodriguez, M., and Gupta, A. Learning a predictable and generative vector representation for objects. In ECCV, 2016.
      Google ScholarLocate open access versionFindings
    • Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial nets. In NIPS, 2014.
      Google ScholarLocate open access versionFindings
    • Arora, S. and Zhang, Y. Do gans actually learn the distribution? an empirical study. CoRR, abs/1706.08224, 2017.
      Findings
    • Bogo, F., Romero, J., Pons-Moll, G., and Black, M. J. Dynamic FAUST: Registering human bodies in motion. In IEEE CVPR, 2017.
      Google ScholarLocate open access versionFindings
    • Bowman, S. R., Vilnis, L., Vinyals, O., Dai, A. M., Jozefowicz, R., and Bengio, S. Generating sentences from a continuous space. CoRR, abs/1511.06349, 2015.
      Findings
    • Brock, A., Lim, T., Ritchie, J. M., and Weston, N. Generative and discriminative voxel modeling with convolutional neural networks. CoRR, abs/1608.04236, 2016.
      Findings
    • Bruna, J., Zaremba, W., Szlam, A., and LeCun, Y. Spectral networks and locally connected networks on graphs. CoRR, abs/1312.6203, 2013.
      Findings
    • Chang, A. X., Funkhouser, T. A., Guibas, L. J., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Yi, L., and Yu, F. Shapenet: An informationrich 3d model repository. CoRR, abs/1512.03012, 2015.
      Findings
    • Che, T., Li, Y., Jacob, A. P., Bengio, Y., and Li, W. Mode regularized generative adversarial networks. CoRR, abs/1612.02136, 2016.
      Findings
    • Chen, D.-Y., Tian, X.-P., Shen, Y.-T., and Ouhyoung, M. On Visual Similarity Based 3D Model Retrieval. Computer Graphics Forum, 2003.
      Google ScholarFindings
    • Defferrard, M., Bresson, X., and Vandergheynst, P. Convolutional neural networks on graphs with fast localized spectral filtering. In NIPS, 2016.
      Google ScholarLocate open access versionFindings
    • Dempster, A. P., Laird, N. M., and Rubin, D. B. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, 39(1), 1977.
      Google ScholarLocate open access versionFindings
    • Dilokthanakul, N., Mediano, P. A., Garnelo, M., Lee, M. C., Salimbeni, H., Arulkumaran, K., and Shanahan, M. Deep unsupervised clustering with gaussian mixture variational autoencoders. CoRR, abs/1611.02648, 2016.
      Findings
    • Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., and Courville, A. C. Improved training of wasserstein gans. CoRR, abs/1704.00028, 2017.
      Findings
    • Hegde, V. and Zadeh, R. Fusionnet: 3d object classification using multiple data representations. CoRR, abs/1607.05695, 2016.
      Findings
    • Henaff, M., Bruna, J., and LeCun, Y. Deep convolutional networks on graph-structured data. CoRR, abs/1506.05163, 2015.
      Findings
    • Ioffe, S. and Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015.
      Google ScholarLocate open access versionFindings
    • Kalogerakis, E., Averkiou, M., Maji, S., and Chaudhuri, S. 3d shape segmentation with projective convolutional networks. CoRR, abs/1612.02808, 2016.
      Findings
    • Kazhdan, M., Funkhouser, T., and Rusinkiewicz, S. Rotation invariant spherical harmonic representation of 3d shape descriptors. In ACM SGP, 2003.
      Google ScholarLocate open access versionFindings
    • Kingma, D. P. and Welling, M. Auto-encoding variational bayes. CoRR, abs/1312.6114, 2013.
      Findings
    • Kingma, D. P., Salimans, T., and Welling, M. Improving variational inference with inverse autoregressive flow. CoRR, abs/1606.04934, 2016.
      Findings
    • Kullback, S. and Leibler, R. A. On information and sufficiency. Annals of Mathematical Statistics, 1951.
      Google ScholarLocate open access versionFindings
    • Lewiner, T., Lopes, H., Vieira, A. W., and Tavares, G. Efficient implementation of marching cubes’ cases with topological guarantees. Journal of Graphics Tools, 2003.
      Google ScholarLocate open access versionFindings
    • Maas, A. L., Hannun, A. Y., and Ng, A. Y. Rectifier nonlinearities improve neural network acoustic models. In ICML, 2013.
      Google ScholarFindings
    • Maimaitimin, M., Watanabe, K., and Maeyama, S. Stacked convolutional auto-encoders for surface recognition based on 3d point cloud data. Artificial Life and Robotics, 2017.
      Google ScholarLocate open access versionFindings
    • Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., and Frey, B. Adversarial autoencoders. CoRR, abs/1511.05644, 2015.
      Findings
    • Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and Dean, J. Distributed representations of words and phrases and their compositionality. CoRR, abs/1310.4546, 2013.
      Findings
    • Nair, V. and Hinton, G. E. Rectified linear units improve restricted boltzmann machines. In ICML, 2010.
      Google ScholarLocate open access versionFindings
    • Qi, C. R., Su, H., Mo, K., and Guibas, L. J. Pointnet: Deep learning on point sets for 3d classification and segmentation. CoRR, abs/1612.00593, 2016a.
      Findings
    • Qi, C. R., Su, H., Nießner, M., Dai, A., Yan, M., and Guibas, L. J. Volumetric and multi-view cnns for object classification on 3d data. In IEEE CVPR, 2016b.
      Google ScholarLocate open access versionFindings
    • Qi, C. R., Yi, L., Su, H., and Guibas, L. J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. CoRR, 2017.
      Google ScholarFindings
    • Rubner, Y., Tomasi, C., and Guibas, L. J. The earth mover’s distance as a metric for image retrieval. IJCV, 2000.
      Google ScholarLocate open access versionFindings
    • Rumelhart, D. E., Hinton, G. E., and Williams, R. J. Learning representations by back-propagating errors. Cognitive modeling, 5, 1988.
      Google ScholarLocate open access versionFindings
    • Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., and Chen, X. Improved techniques for training gans. In NIPS, 2016.
      Google ScholarLocate open access versionFindings
    • Sharma, A., Grau, O., and Fritz, M. Vconv-dae: Deep volumetric shape learning without object labels. In ECCV Workshop, 2016.
      Google ScholarLocate open access versionFindings
    • Sønderby, C. K., Raiko, T., Maaløe, L., Sønderby, S. K., and Winther, O. How to train deep variational autoencoders and probabilistic ladder networks. CoRR, abs/1602.02282, 2016.
      Findings
    • Su, H., Maji, S., Kalogerakis, E., and Learned-Miller, E. G. Multi-view convolutional neural networks for 3d shape recognition. In 2015 IEEE ICCV, 2015.
      Google ScholarLocate open access versionFindings
    • Tasse, F. P. and Dodgson, N. Shape2vec: Semantic-based descriptors for 3d shapes, sketches and images. ACM Trans. Graph., 2016.
      Google ScholarLocate open access versionFindings
    • Wang, Y., Xie, Z., Xu, K., Dou, Y., and Lei, Y. An efficient and effective convolutional auto-encoder extreme learning machine network for 3d feature learning. Neurocomputing, 174, 2016.
      Google ScholarLocate open access versionFindings
    • Wei, L., Huang, Q., Ceylan, D., Vouga, E., and Li, H. Dense human body correspondences using convolutional networks. In IEEE CVPR, 2016.
      Google ScholarLocate open access versionFindings
    • Wu, J., Zhang, C., Xue, T., Freeman, B., and Tenenbaum, J. Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I., and Garnett, R. (eds.), NIPS. 2016.
      Google ScholarFindings
    • Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., and Xiao, J. 3d shapenets: A deep representation for volumetric shapes. In IEEE CVPR, 2015.
      Google ScholarLocate open access versionFindings
    • Yi, L., Su, H., Guo, X., and Guibas, L. J. Syncspeccnn: Synchronized spectral CNN for 3d shape segmentation. CoRR, abs/1612.00606, 2016.
      Findings
    • Zhu, Z., Wang, X., Bai, S., Yao, C., and Bai, X. Deep learning representation using autoencoder for 3d shape retrieval. Neurocomputing, 2016.
      Google ScholarLocate open access versionFindings
    Your rating :
    0

     

    Tags
    Comments