ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing
CoRR(2024)
Abstract
Recently, researchers have proposed powerful systems for generating andmanipulating images using natural language instructions. However, it isdifficult to precisely specify many common classes of image transformationswith text alone. For example, a user may wish to change the location and breedof a particular dog in an image with several similar dogs. This task is quitedifficult with natural language alone, and would require a user to write alaboriously complex prompt that both disambiguates the target dog and describesthe destination. We propose ClickDiffusion, a system for precise imagemanipulation and generation that combines natural language instructions withvisual feedback provided by the user through a direct manipulation interface.We demonstrate that by serializing both an image and a multi-modal instructioninto a textual representation it is possible to leverage LLMs to performprecise transformations of the layout and appearance of an image. Codeavailable at https://github.com/poloclub/ClickDiffusion.
MoreTranslated text
Key words
Interoperable
PDF
View via Publisher
AI Read Science
Must-Reading Tree
Example

Generate MRT to find the research sequence of this paper
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper
Summary is being generated by the instructions you defined