Pix2Pix-OnTheFly: Leveraging LLMs for Instruction-Guided Image Editing
arxiv(2024)
摘要
The combination of language processing and image processing keeps attracting
increased interest given recent impressive advances that leverage the combined
strengths of both domains of research. Among these advances, the task of
editing an image on the basis solely of a natural language instruction stands
out as a most challenging endeavour. While recent approaches for this task
resort, in one way or other, to some form of preliminary preparation, training
or fine-tuning, this paper explores a novel approach: We propose a
preparation-free method that permits instruction-guided image editing on the
fly. This approach is organized along three steps properly orchestrated that
resort to image captioning and DDIM inversion, followed by obtaining the edit
direction embedding, followed by image editing proper. While dispensing with
preliminary preparation, our approach demonstrates to be effective and
competitive, outperforming recent, state of the art models for this task when
evaluated on the MAGICBRUSH dataset.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要