# Learning Representations and Generative Models for 3D Point Clouds

ICML, pp. 40-49, 2018.

EI

Keywords:

Wei bo:

Abstract:

Three-dimensional geometric data offer an excellent domain for studying representation learning and generative modeling. In this paper, we look at geometric data represented as point clouds. We introduce a deep autoencoder (AE) network with excellent reconstruction quality and generalization ability. The learned representations outperform...More

Code:

Data:

Introduction

- Three-dimensional (3D) representations of real-life objects are a core tool for vision, robotics, medicine, augmented reality and virtual reality applications.
- Recent classification work on point clouds (PointNet (Qi et al, 2016a)) bypasses this issue by avoiding convolutions involving groups of points
- Another related issue with point clouds as a representation is that they are permutation invariant: any reordering of the rows of the point cloud matrix yields a point cloud that represents the same shape.
- It creates the need for making the encoded feature permutation invariant

Highlights

- Three-dimensional (3D) representations of real-life objects are a core tool for vision, robotics, medicine, augmented reality and virtual reality applications
- One workflow that we propose is to first train an AE to learn a latent representation and train a generative model in that fixed latent space
- Qualitative results In Fig. 5, we show some synthetic results produced by our l-Generative Adversarial Networks and the 32-component Gaussian Mixture Models
- We presented a novel set of architectures for 3D point cloud representation learning and generation
- Our extensive experiments show that the best generative model for point clouds is a Gaussian Mixture Models trained in the fixed latent space of an AE
- A thorough investigation on the conditions under which simple latent Gaussian Mixture Models are as powerful as adversarially trained models would be of significant interest

Results

**Evaluation Metrics for Generative Models**

An important component of this work is the introduction of measures that enable comparisons between two sets of points clouds A and B.**Evaluation Metrics for Generative Models**.- An important component of this work is the introduction of measures that enable comparisons between two sets of points clouds A and B.
- These metrics are useful for assessing the degree to which point clouds, synthesized or reconstructed, represent the same population as a held-out test set.
- The authors' three measures are described below

Conclusion

- The complementary nature of MMD and Coverage directly follows from their definitions.
- The authors' extensive experiments show that the best generative model for point clouds is a GMM trained in the fixed latent space of an AE.
- While this might not be a universal result, it suggests that simple classic tools should not be dismissed.
- A thorough investigation on the conditions under which simple latent GMMs are as powerful as adversarially trained models would be of significant interest

Summary

## Introduction:

Three-dimensional (3D) representations of real-life objects are a core tool for vision, robotics, medicine, augmented reality and virtual reality applications.- Recent classification work on point clouds (PointNet (Qi et al, 2016a)) bypasses this issue by avoiding convolutions involving groups of points
- Another related issue with point clouds as a representation is that they are permutation invariant: any reordering of the rows of the point cloud matrix yields a point cloud that represents the same shape.
- It creates the need for making the encoded feature permutation invariant
## Results:

**Evaluation Metrics for Generative Models**

An important component of this work is the introduction of measures that enable comparisons between two sets of points clouds A and B.**Evaluation Metrics for Generative Models**.- An important component of this work is the introduction of measures that enable comparisons between two sets of points clouds A and B.
- These metrics are useful for assessing the degree to which point clouds, synthesized or reconstructed, represent the same population as a held-out test set.
- The authors' three measures are described below
## Conclusion:

The complementary nature of MMD and Coverage directly follows from their definitions.- The authors' extensive experiments show that the best generative model for point clouds is a GMM trained in the fixed latent space of an AE.
- While this might not be a universal result, it suggests that simple classic tools should not be dismissed.
- A thorough investigation on the conditions under which simple latent GMMs are as powerful as adversarially trained models would be of significant interest

- Table1: Generalization of AEs as captured by MMD. Measurements for reconstructions on the training and test splits for an AE trained with either the CD or EMD loss and data of the chair class; Note how the MMD favors the AE that was trained with the same loss as the one used by the MMD to make the matching
- Table2: Classification performance (in %) on ModelNet10/40. Comparing to A: SPH (<a class="ref-link" id="cKazhdan_et+al_2003_a" href="#rKazhdan_et+al_2003_a">Kazhdan et al, 2003</a>), B: LFD (<a class="ref-link" id="cChen_et+al_2003_a" href="#rChen_et+al_2003_a">Chen et al, 2003</a>), C: T-L-Net (<a class="ref-link" id="cGirdhar_et+al_2016_a" href="#rGirdhar_et+al_2016_a">Girdhar et al, 2016</a>), D: VConv-DAE (<a class="ref-link" id="cSharma_et+al_2016_a" href="#rSharma_et+al_2016_a">Sharma et al, 2016</a>), E: 3D-GAN (<a class="ref-link" id="cWu_et+al_2016_a" href="#rWu_et+al_2016_a">Wu et al, 2016</a>)
- Table3: Evaluating 5 generators on the test split of the chair dataset on epochs/models selected via minimal JSD on the validation-split. We report: A: sampling-based memorization baseline, B: r-GAN, C: l-GAN (AE-CD), D: l-GAN (AE-EMD) , E: l-WGAN (AEEMD), F: GMM (AE-EMD)
- Table4: Fidelity (MMD-EMD) and coverage (COV-EMD) comparison between A: Wu et al (2016) and our GMM generative model on the test split of each class. Note that Wu et al uses all models of each class for training contrary to our generators
- Table5: MMD and Coverage metrics evaluated on the output of voxel-based methods at resolution 643, matched against the chair test set, using the same protocol as in Table3. Comparing: A: “raw” 643-voxel GAN (<a class="ref-link" id="cWu_et+al_2016_a" href="#rWu_et+al_2016_a">Wu et al, 2016</a>) and a latent 643-voxel GMM
- Table6: MMD-CD measurements for l-WGANs trained on the latent spaces of dedicated (left 5 columns) and multi-class EMDAEs (right column). Also shown is the weighted average of the per-class values, using the number of train (Tr) resp. test (Te) examples of each class as weights. All l-WGANs use the model parameter resulted by 2000 epochs of training

Related work

- Recently, deep learning architectures for view-based projections (Su et al, 2015; Wei et al, 2016; Kalogerakis et al, 2016), volumetric grids (Qi et al, 2016b; Wu et al, 2015; Hegde & Zadeh, 2016) and graphs (Bruna et al, 2013; Henaff et al, 2015; Defferrard et al, 2016; Yi et al, 2016) have appeared in the 3D machine learning literature.

A few recent works ((Wu et al, 2016), (Wang et al, 2016),

(Girdhar et al, 2016), (Brock et al, 2016), (Maimaitimin et al, 2017), (Zhu et al, 2016)) have explored generative and discriminative representations for geometry. They operate on different modalities, typically voxel grids or view-based image projections. To the best of our knowledge, our work is the first to study such representations for point clouds.

Training Gaussian mixture models (GMM) in the latent space of an autoencoder is closely related to VAEs (Kingma & Welling, 2013). One documented issue with VAEs is overregularization: the regularization term associated with the prior, is often so strong that reconstruction quality suffers (Bowman et al, 2015; Sønderby et al, 2016; Kingma et al, 2016; Dilokthanakul et al, 2016). The literature contains methods that start only with a reconstruction penalty and slowly increase the weight of the regularizer. An alternative approach is based on adversarial autoencoders (Makhzani et al, 2015) which use a GAN to implicitly regularize the latent space of an AE.

Funding

- Last but not least, they wish to acknowledge the support of NSF grants NSF IIS-1528025 and DMS-1546206, ONR MURI grant N00014-13-1-0341, a Google Focused Research award, and a gift from Amazon Web Services for Machine Learning Research. Fan, H., Su, H., and Guibas, L

Reference

- Girdhar, R., Fouhey, D. F., Rodriguez, M., and Gupta, A. Learning a predictable and generative vector representation for objects. In ECCV, 2016.
- Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial nets. In NIPS, 2014.
- Arora, S. and Zhang, Y. Do gans actually learn the distribution? an empirical study. CoRR, abs/1706.08224, 2017.
- Bogo, F., Romero, J., Pons-Moll, G., and Black, M. J. Dynamic FAUST: Registering human bodies in motion. In IEEE CVPR, 2017.
- Bowman, S. R., Vilnis, L., Vinyals, O., Dai, A. M., Jozefowicz, R., and Bengio, S. Generating sentences from a continuous space. CoRR, abs/1511.06349, 2015.
- Brock, A., Lim, T., Ritchie, J. M., and Weston, N. Generative and discriminative voxel modeling with convolutional neural networks. CoRR, abs/1608.04236, 2016.
- Bruna, J., Zaremba, W., Szlam, A., and LeCun, Y. Spectral networks and locally connected networks on graphs. CoRR, abs/1312.6203, 2013.
- Chang, A. X., Funkhouser, T. A., Guibas, L. J., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Yi, L., and Yu, F. Shapenet: An informationrich 3d model repository. CoRR, abs/1512.03012, 2015.
- Che, T., Li, Y., Jacob, A. P., Bengio, Y., and Li, W. Mode regularized generative adversarial networks. CoRR, abs/1612.02136, 2016.
- Chen, D.-Y., Tian, X.-P., Shen, Y.-T., and Ouhyoung, M. On Visual Similarity Based 3D Model Retrieval. Computer Graphics Forum, 2003.
- Defferrard, M., Bresson, X., and Vandergheynst, P. Convolutional neural networks on graphs with fast localized spectral filtering. In NIPS, 2016.
- Dempster, A. P., Laird, N. M., and Rubin, D. B. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, 39(1), 1977.
- Dilokthanakul, N., Mediano, P. A., Garnelo, M., Lee, M. C., Salimbeni, H., Arulkumaran, K., and Shanahan, M. Deep unsupervised clustering with gaussian mixture variational autoencoders. CoRR, abs/1611.02648, 2016.
- Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., and Courville, A. C. Improved training of wasserstein gans. CoRR, abs/1704.00028, 2017.
- Hegde, V. and Zadeh, R. Fusionnet: 3d object classification using multiple data representations. CoRR, abs/1607.05695, 2016.
- Henaff, M., Bruna, J., and LeCun, Y. Deep convolutional networks on graph-structured data. CoRR, abs/1506.05163, 2015.
- Ioffe, S. and Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015.
- Kalogerakis, E., Averkiou, M., Maji, S., and Chaudhuri, S. 3d shape segmentation with projective convolutional networks. CoRR, abs/1612.02808, 2016.
- Kazhdan, M., Funkhouser, T., and Rusinkiewicz, S. Rotation invariant spherical harmonic representation of 3d shape descriptors. In ACM SGP, 2003.
- Kingma, D. P. and Welling, M. Auto-encoding variational bayes. CoRR, abs/1312.6114, 2013.
- Kingma, D. P., Salimans, T., and Welling, M. Improving variational inference with inverse autoregressive flow. CoRR, abs/1606.04934, 2016.
- Kullback, S. and Leibler, R. A. On information and sufficiency. Annals of Mathematical Statistics, 1951.
- Lewiner, T., Lopes, H., Vieira, A. W., and Tavares, G. Efficient implementation of marching cubes’ cases with topological guarantees. Journal of Graphics Tools, 2003.
- Maas, A. L., Hannun, A. Y., and Ng, A. Y. Rectifier nonlinearities improve neural network acoustic models. In ICML, 2013.
- Maimaitimin, M., Watanabe, K., and Maeyama, S. Stacked convolutional auto-encoders for surface recognition based on 3d point cloud data. Artificial Life and Robotics, 2017.
- Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., and Frey, B. Adversarial autoencoders. CoRR, abs/1511.05644, 2015.
- Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and Dean, J. Distributed representations of words and phrases and their compositionality. CoRR, abs/1310.4546, 2013.
- Nair, V. and Hinton, G. E. Rectified linear units improve restricted boltzmann machines. In ICML, 2010.
- Qi, C. R., Su, H., Mo, K., and Guibas, L. J. Pointnet: Deep learning on point sets for 3d classification and segmentation. CoRR, abs/1612.00593, 2016a.
- Qi, C. R., Su, H., Nießner, M., Dai, A., Yan, M., and Guibas, L. J. Volumetric and multi-view cnns for object classification on 3d data. In IEEE CVPR, 2016b.
- Qi, C. R., Yi, L., Su, H., and Guibas, L. J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. CoRR, 2017.
- Rubner, Y., Tomasi, C., and Guibas, L. J. The earth mover’s distance as a metric for image retrieval. IJCV, 2000.
- Rumelhart, D. E., Hinton, G. E., and Williams, R. J. Learning representations by back-propagating errors. Cognitive modeling, 5, 1988.
- Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., and Chen, X. Improved techniques for training gans. In NIPS, 2016.
- Sharma, A., Grau, O., and Fritz, M. Vconv-dae: Deep volumetric shape learning without object labels. In ECCV Workshop, 2016.
- Sønderby, C. K., Raiko, T., Maaløe, L., Sønderby, S. K., and Winther, O. How to train deep variational autoencoders and probabilistic ladder networks. CoRR, abs/1602.02282, 2016.
- Su, H., Maji, S., Kalogerakis, E., and Learned-Miller, E. G. Multi-view convolutional neural networks for 3d shape recognition. In 2015 IEEE ICCV, 2015.
- Tasse, F. P. and Dodgson, N. Shape2vec: Semantic-based descriptors for 3d shapes, sketches and images. ACM Trans. Graph., 2016.
- Wang, Y., Xie, Z., Xu, K., Dou, Y., and Lei, Y. An efficient and effective convolutional auto-encoder extreme learning machine network for 3d feature learning. Neurocomputing, 174, 2016.
- Wei, L., Huang, Q., Ceylan, D., Vouga, E., and Li, H. Dense human body correspondences using convolutional networks. In IEEE CVPR, 2016.
- Wu, J., Zhang, C., Xue, T., Freeman, B., and Tenenbaum, J. Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I., and Garnett, R. (eds.), NIPS. 2016.
- Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., and Xiao, J. 3d shapenets: A deep representation for volumetric shapes. In IEEE CVPR, 2015.
- Yi, L., Su, H., Guo, X., and Guibas, L. J. Syncspeccnn: Synchronized spectral CNN for 3d shape segmentation. CoRR, abs/1612.00606, 2016.
- Zhu, Z., Wang, X., Bai, S., Yao, C., and Bai, X. Deep learning representation using autoencoder for 3d shape retrieval. Neurocomputing, 2016.

Tags

Comments