TeMO: Towards Text-Driven 3D Stylization for Multi-Object Meshes
CVPR 2024(2023)
摘要
Recent progress in the text-driven 3D stylization of a single object has been
considerably promoted by CLIP-based methods. However, the stylization of
multi-object 3D scenes is still impeded in that the image-text pairs used for
pre-training CLIP mostly consist of an object. Meanwhile, the local details of
multiple objects may be susceptible to omission due to the existing supervision
manner primarily relying on coarse-grained contrast of image-text pairs. To
overcome these challenges, we present a novel framework, dubbed TeMO, to parse
multi-object 3D scenes and edit their styles under the contrast supervision at
multiple levels. We first propose a Decoupled Graph Attention (DGA) module to
distinguishably reinforce the features of 3D surface points. Particularly, a
cross-modal graph is constructed to align the object points accurately and noun
phrases decoupled from the 3D mesh and textual description. Then, we develop a
Cross-Grained Contrast (CGC) supervision system, where a fine-grained loss
between the words in the textual description and the randomly rendered images
are constructed to complement the coarse-grained loss. Extensive experiments
show that our method can synthesize high-quality stylized content and
outperform the existing methods over a wide range of multi-object 3D meshes.
Our code and results will be made publicly available
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要