DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face Generation
arxiv(2024)
摘要
While large-scale pre-trained text-to-image models can synthesize diverse and
high-quality human-centered images, novel challenges arise with a nuanced task
of "identity fine editing": precisely modifying specific features of a subject
while maintaining its inherent identity and context. Existing personalization
methods either require time-consuming optimization or learning additional
encoders, adept in "identity re-contextualization". However, they often
struggle with detailed and sensitive tasks like human face editing. To address
these challenges, we introduce DreamSalon, a noise-guided, staged-editing
framework, uniquely focusing on detailed image manipulations and
identity-context preservation. By discerning editing and boosting stages via
the frequency and gradient of predicted noises, DreamSalon first performs
detailed manipulations on specific features in the editing stage, guided by
high-frequency information, and then employs stochastic denoising in the
boosting stage to improve image quality. For more precise editing, DreamSalon
semantically mixes source and target textual prompts, guided by differences in
their embedding covariances, to direct the model's focus on specific
manipulation areas. Our experiments demonstrate DreamSalon's ability to
efficiently and faithfully edit fine details on human faces, outperforming
existing methods both qualitatively and quantitatively.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要