Deep Parametric Shape Predictions using Distance Fields

    CVPR 2020, 2019.

    Cited by: 3|Bibtex|Views72|Links
    EI
    Keywords:
    3d shapealexei a efrosgenerative adversarial networkdistance fieldconstructive solid geometryMore(5+)
    Wei bo:
    Representation is a key theme in deep learning—and machine learning more broadly—applied to geometry

    Abstract:

    Many tasks in graphics and vision demand machinery for converting shapes into representations with sparse sets of parameters; these representations facilitate rendering, editing, and storage. When the source data is noisy or ambiguous, however, artists and engineers often manually construct such representations, a tedious and potentially ...More

    Code:

    Data:

    0
    Introduction
    • The creation, modification, and rendering of parametric shapes, such as in vector graphics, is a fundamental problem of interest to engineers, artists, animators, and designers
    • Such representations offer distinct advantages.
    • It is often useful to generate parametric models from data that do not directly correspond to the target geometry and contain imperfections or missing parts
    • This can be an artifact of noise, corruption, or human-generated input; often, an artist intends to create a precise geometric object but produces one that is “sketchy” and ambiguous.
    • The authors turn to machine learning methods, which have shown success in inferring structure from noisy data
    Highlights
    • The creation, modification, and rendering of parametric shapes, such as in vector graphics, is a fundamental problem of interest to engineers, artists, animators, and designers
    • We propose a learning framework for predicting parametric shapes, addressing the aforementioned issues
    • We introduce a framework for formulating loss functions suitable for learning parametric shapes in 2D and 3D; our formulation generalizes Chamfer distance and leads to stronger loss functions that improve performance on a variety of tasks
    • Representation is a key theme in deep learning—and machine learning more broadly—applied to geometry
    • While considerable effort has been put into choosing representations for certain tasks, the tasks we consider have fixed representations for the input and output: They take in a shape as a function on a grid and output a sparse set of parameters
    Methods
    • The authors introduce a framework for formulating loss functions suitable for learning parametric shapes in 2D and 3D; the formulation generalizes Chamfer distance and leads to stronger loss functions that improve performance on a variety of tasks.
    • The authors start by defining a general loss on distance fields and propose two specific losses.
    • General Distance Field Loss.
    • Given A, B ⊆ Rn, let dA, dB : Rn → R+ measure distance from each point in Rn to A and B, respectively, dA(x) := infy∈A x − y 2.
    • The authors define a general distance field loss as
    Results
    • The authors achieve a mean accuracy of 94.6%, exceeding the 89.0% accuracy of [41].
    Conclusion
    • Representation is a key theme in deep learning—and machine learning more broadly—applied to geometry.
    • Assorted means of communicating a shape to and from a deep network present varying tradeoffs between efficiency, quality, and applicability.
    • The authors' learning procedure is applicable to many additional tasks.
    • A natural step is to incorporate the network into more complex pipelines for tasks like vectorization of complex drawings [3], for which the output of a learning procedure needs to be combined with classical techniques to ensure smooth, topologically valid output.
    • A challenging direction might be to incorporate user guidance into training or evaluation, developing the algorithm as a partner in shape reconstruction rather than generating a deterministic output
    Summary
    • Introduction:

      The creation, modification, and rendering of parametric shapes, such as in vector graphics, is a fundamental problem of interest to engineers, artists, animators, and designers
    • Such representations offer distinct advantages.
    • It is often useful to generate parametric models from data that do not directly correspond to the target geometry and contain imperfections or missing parts
    • This can be an artifact of noise, corruption, or human-generated input; often, an artist intends to create a precise geometric object but produces one that is “sketchy” and ambiguous.
    • The authors turn to machine learning methods, which have shown success in inferring structure from noisy data
    • Methods:

      The authors introduce a framework for formulating loss functions suitable for learning parametric shapes in 2D and 3D; the formulation generalizes Chamfer distance and leads to stronger loss functions that improve performance on a variety of tasks.
    • The authors start by defining a general loss on distance fields and propose two specific losses.
    • General Distance Field Loss.
    • Given A, B ⊆ Rn, let dA, dB : Rn → R+ measure distance from each point in Rn to A and B, respectively, dA(x) := infy∈A x − y 2.
    • The authors define a general distance field loss as
    • Results:

      The authors achieve a mean accuracy of 94.6%, exceeding the 89.0% accuracy of [41].
    • Conclusion:

      Representation is a key theme in deep learning—and machine learning more broadly—applied to geometry.
    • Assorted means of communicating a shape to and from a deep network present varying tradeoffs between efficiency, quality, and applicability.
    • The authors' learning procedure is applicable to many additional tasks.
    • A natural step is to incorporate the network into more complex pipelines for tasks like vectorization of complex drawings [3], for which the output of a learning procedure needs to be combined with classical techniques to ensure smooth, topologically valid output.
    • A challenging direction might be to incorporate user guidance into training or evaluation, developing the algorithm as a partner in shape reconstruction rather than generating a deterministic output
    Tables
    • Table1: Comparison between subsets of our full model as well as standard Chamfer distance and AtlasNet. Average error is Chamfer distance (in pixels on a 128×128 image) between ground truth and uniformly sampled predicted curves
    Download tables as Excel
    Related work
    • Deep shape reconstruction. Reconstructing geometry from one or more viewpoints is crucial in applications like robotics and autonomous driving [13, 35, 38]. Recent deep networks can produce point clouds or voxel occupancy grids given a single image [12, 8], but their output suffers from fixed resolution.

      Learning signed distance fields defined on a voxel grid [9, 37] or directly [30] allows high-resolution rendering but requires surface extraction; this representation is neither sparse nor modular. Liao et al address the rendering issue by incorporating marching cubes into a differentiable pipeline, but the lack of sparsity remains problematic, and predicted shapes are still on a voxel grid [23].

      Parametric shapes offer a sparse, non-voxelized solution. Methods for converting point clouds to geometric primitives achieve high-quality results but require supervision, either relying on existing labeled data [27, 26, 15] or prescribed templates [14]. Groueix et al output primitives at any resolution, but their primitives are not naturally parameterized or sparsely represented [17]. Genova et. al. propose to represent geometry as isosurfaces of axis-aligned Gaussians [16]. Others [17, 39, 31] develop tailored primitives but use standard Chamfer distance as the loss objective. We demonstrate and address the issues inherent in Chamfer distance.
    Funding
    • The authors acknowledge the generous support of Army Research Office grant W911NF1710068, Air Force Office of Scientific Research award FA9550-19-1-031, of National Science Foundation grant IIS-1838071, National Science Foundation Graduate Research Fellowship under Grant No 1122374, from an Amazon Research Award, from the MITIBM Watson AI Laboratory, from the Toyota-CSAIL Joint Research Center, from a gift from Adobe Systems, and from the Skoltech-MIT Next Generation Program
    Reference
    • Samaneh Azadi, Matthew Fisher, Vladimir Kim, Zhaowen Wang, Eli Shechtman, and Trevor Darrell. Multi-content gan for few-shot font style transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, volume 11, page 13, 2018. 2, 7
      Google ScholarLocate open access versionFindings
    • Elena Balashova, Amit Bermano, Vladimir G. Kim, Stephen DiVerdi, Aaron Hertzmann, and Thomas Funkhouser. Learning a stroke-based representation for fonts. CGF, 2018. 2
      Google ScholarLocate open access versionFindings
    • Mikhail Bessmeltsev and Justin Solomon. Vectorization of line drawings via polyvector fields. ACM Transactions on Graphics (TOG), 2019. 8
      Google ScholarLocate open access versionFindings
    • James F Blinn. A generalization of algebraic surface drawing. ACM Transactions on Graphics (TOG), 1(3):235–256, 1982. 8
      Google ScholarLocate open access versionFindings
    • Gunilla Borgefors. Distance transformations in arbitrary dimensions. Computer vision, graphics, and image processing, 27(3):321–345, 1984. 2
      Google ScholarLocate open access versionFindings
    • Neill DF Campbell and Jan Kautz. Learning a manifold of fonts. ACM Transactions on Graphics (TOG), 33(4):91, 2014. 2
      Google ScholarLocate open access versionFindings
    • Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi, and Fisher Yu. ShapeNet: An Information-Rich 3D Model Repository. Technical Report arXiv:1512.03012 [cs.GR], Stanford University — Princeton University — Toyota Technological Institute at Chicago, 2015. 2, 8
      Findings
    • Christopher B Choy, Danfei Xu, JunYoung Gwak, Kevin Chen, and Silvio Savarese. 3d-r2n2: A unified approach for single and multi-view 3d object reconstruction. In European conference on computer vision, pages 628–644. Springer, 2016. 2
      Google ScholarLocate open access versionFindings
    • Angela Dai, Charles Ruizhongtai Qi, and Matthias Nießner. Shape completion using 3d-encoder-predictor cnns and shape synthesis. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), volume 3, 2017. 2
      Google ScholarLocate open access versionFindings
    • Tao Du, Jeevana Priya Inala, Yewen Pu, Andrew Spielberg, Adriana Schulz, Daniela Rus, Armando Solar-Lezama, and Wojciech Matusik. Inversecsg: Automatic conversion of 3d models to csg trees. In SIGGRAPH Asia 2018 Technical Papers, page 213. ACM, 2018. 8
      Google ScholarLocate open access versionFindings
    • David S Ebert and F Kenton Musgrave. Texturing & modeling: a procedural approach. Morgan Kaufmann, 2003. 4
      Google ScholarFindings
    • Haoqiang Fan, Hao Su, and Leonidas J Guibas. A point set generation network for 3d object reconstruction from a single image. In CVPR, volume 2, page 6, 2017. 1, 2
      Google ScholarLocate open access versionFindings
    • Jorge Fuentes-Pacheco, Jose Ruiz-Ascencio, and Juan Manuel Rendon-Mancha. Visual simultaneous localization and mapping: a survey. Artificial Intelligence Review, 43(1):55–81, 2015. 2
      Google ScholarLocate open access versionFindings
    • Vignesh Ganapathi-Subramanian, Olga Diamanti, Soeren Pirk, Chengcheng Tang, Matthias Niessner, and Leonidas Guibas. Parsing geometry using structure-aware shape templates. In 2018 International Conference on 3D Vision (3DV), pages 672–681. IEEE, 2018. 2
      Google ScholarLocate open access versionFindings
    • Jun Gao, Chengcheng Tang, Vignesh GanapathiSubramanian, Jiahui Huang, Hao Su, and Leonidas J Guibas. Deepspline: Data-driven reconstruction of parametric curves and surfaces. arXiv preprint arXiv:1901.03781, 2019. 2
      Findings
    • Kyle Genova, Forrester Cole, Daniel Vlasic, Aaron Sarna, William T. Freeman, and Thomas Funkhouser. Learning shape templates with structured implicit functions. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2019. 2
      Google ScholarLocate open access versionFindings
    • Thibault Groueix, Matthew Fisher, Vladimir G. Kim, Bryan Russell, and Mathieu Aubry. AtlasNet: A Papier-Mache Approach to Learning 3D Surface Generation. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2018. 1, 2, 5, 6
      Google ScholarLocate open access versionFindings
    • Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 4
      Google ScholarLocate open access versionFindings
    • Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with conditional adversarial networks. CVPR, 2017. 1
      Google ScholarLocate open access versionFindings
    • Vladimir G Kim, Wilmot Li, Niloy J Mitra, Siddhartha Chaudhuri, Stephen DiVerdi, and Thomas Funkhouser. Learning part-based templates from large collections of 3d shapes. ACM Transactions on Graphics (TOG), 32(4):70, 2013. 2
      Google ScholarLocate open access versionFindings
    • Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. International Conference on Learning Representations, 12 2014. 4
      Google ScholarLocate open access versionFindings
    • Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012. 1
      Google ScholarLocate open access versionFindings
    • Yiyi Liao, Simon Donne, and Andreas Geiger. Deep marching cubes: Learning explicit surface representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2916–2925, 2018. 2
      Google ScholarLocate open access versionFindings
    • Xia Liu and Kikuo Fujimura. Hand gesture recognition using depth data. In Proc. 6th IEEE Int. Conf. Automatic Face Gesture Recog., page 529. IEEE, 2004. 1, 2
      Google ScholarLocate open access versionFindings
    • Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015. 1
      Google ScholarLocate open access versionFindings
    • Kaichun Mo, Paul Guerrero, Li Yi, Hao Su, Peter Wonka, Niloy Mitra, and Leonidas Guibas. Structurenet: Hierarchical graph networks for 3d shape generation. ACM Transactions on Graphics (TOG), Siggraph Asia 2019, 38(6):Article 242, 2019. 2
      Google ScholarLocate open access versionFindings
    • Chengjie Niu, Jun Li, and Kai Xu. Im2struct: Recovering 3d shape structure from a single rgb image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4521–4529, 2018. 2
      Google ScholarLocate open access versionFindings
    • Peter O’Donovan, Janis Lıbeks, Aseem Agarwala, and Aaron Hertzmann. Exploratory font selection using crowdsourced attributes. ACM Transactions on Graphics (TOG), 33(4):92, 2014. 2
      Google ScholarLocate open access versionFindings
    • Maks Ovsjanikov, Wilmot Li, Leonidas Guibas, and Niloy J Mitra. Exploration of continuous variability in collections of 3d shapes. In ACM Transactions on Graphics (TOG), volume 30, page 33. ACM, 2011. 2
      Google ScholarLocate open access versionFindings
    • Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. Deepsdf: Learning continuous signed distance functions for shape representation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019. 2
      Google ScholarLocate open access versionFindings
    • Despoina Paschalidou, Ali Osman Ulusoy, and Andreas Geiger. Superquadrics revisited: Learning 3d shape parsing beyond cuboids. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2019. 2, 7
      Google ScholarLocate open access versionFindings
    • Huy Quoc Phan, Hongbo Fu, and Antoni B Chan. Flexyfont: Learning transferring rules for flexible typeface synthesis. In Computer Graphics Forum, volume 34, pages 245–256. Wiley Online Library, 2015. 2
      Google ScholarLocate open access versionFindings
    • Zheng Qin, Michael D McCool, and Craig S Kaplan. Realtime texture-mapped vector glyphs. In Proceedings of the 2006 symposium on Interactive 3D graphics and games, pages 125–132. ACM, 2006. 4
      Google ScholarLocate open access versionFindings
    • Adriana Schulz, Ariel Shamir, Ilya Baran, David IW Levin, Pitchaya Sitthi-Amorn, and Wojciech Matusik. Retrieval on parametric shape collections. ACM Transactions on Graphics (TOG), 36(1):11, 2017. 2
      Google ScholarLocate open access versionFindings
    • Steven M Seitz, Brian Curless, James Diebel, Daniel Scharstein, and Richard Szeliski. A comparison and evaluation of multi-view stereo reconstruction algorithms. In CVPR, 2006. 2
      Google ScholarLocate open access versionFindings
    • Chao-Hui Shen, Hongbo Fu, Kang Chen, and Shi-Min Hu. Structure recovery by part assembly. ACM Transactions on Graphics (TOG), 31(6):180, 2012. 2
      Google ScholarLocate open access versionFindings
    • David Stutz and Andreas Geiger. Learning 3d shape completion under weak supervision. International Journal of Computer Vision, pages 1–20, 2018. 2
      Google ScholarLocate open access versionFindings
    • Hang Su, Subhransu Maji, Evangelos Kalogerakis, and Erik Learned-Miller. Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE international conference on computer vision, pages 945–953, 2015. 2
      Google ScholarLocate open access versionFindings
    • Chunyu Sun, Qianfang Zou, Xin Tong, and Yang Liu. Learning adaptive hierarchical cuboid abstractions of 3d shape collections. ACM Transactions on Graphics (SIGGRAPH Asia), 38(6), 2019. 2
      Google ScholarLocate open access versionFindings
    • Rapee Suveeranont and Takeo Igarashi. Example-based automatic font generation. In International Symposium on Smart Graphics, pages 127–138. Springer, 2010. 2
      Google ScholarLocate open access versionFindings
    • Shubham Tulsiani, Hao Su, Leonidas J Guibas, Alexei A Efros, and Jitendra Malik. Learning shape abstractions by assembling volumetric primitives. In Proc. CVPR, volume 2, 2017. 1, 2, 8, 13
      Google ScholarLocate open access versionFindings
    • Nobuyuki Umetani, Takeo Igarashi, and Niloy J Mitra. Guided exploration of physically valid shapes for furniture design. ACM Trans. Graph., 31(4):86–1, 2012. 2
      Google ScholarLocate open access versionFindings
    • Paul Upchurch, Noah Snavely, and Kavita Bala. From a to z: supervised transfer of style and content using deep neural network generators. arXiv preprint arXiv:1603.02003, 2016. 2
      Findings
    • Jun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Darrell, Alexei A Efros, Oliver Wang, and Eli Shechtman. Toward multimodal image-to-image translation. In Advances in Neural Information Processing Systems, pages 465–476, 2017. 4
      Google ScholarLocate open access versionFindings
    Your rating :
    0

     

    Tags
    Comments