DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing
arxiv(2023)
摘要
Accurate and controllable image editing is a challenging task that has
attracted significant attention recently. Notably, DragGAN is an interactive
point-based image editing framework that achieves impressive editing results
with pixel-level precision. However, due to its reliance on generative
adversarial networks (GANs), its generality is limited by the capacity of
pretrained GAN models. In this work, we extend this editing framework to
diffusion models and propose a novel approach DragDiffusion. By harnessing
large-scale pretrained diffusion models, we greatly enhance the applicability
of interactive point-based editing on both real and diffusion-generated images.
Our approach involves optimizing the diffusion latents to achieve precise
spatial control. The supervision signal of this optimization process is from
the diffusion model's UNet features, which are known to contain rich semantic
and geometric information. Moreover, we introduce two additional techniques,
namely LoRA fine-tuning and latent-MasaCtrl, to further preserve the identity
of the original image. Lastly, we present a challenging benchmark dataset
called DragBench – the first benchmark to evaluate the performance of
interactive point-based image editing methods. Experiments across a wide range
of challenging cases (e.g., images with multiple objects, diverse object
categories, various styles, etc.) demonstrate the versatility and generality of
DragDiffusion. Code: https://github.com/Yujun-Shi/DragDiffusion.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要