Giving a Hand to Diffusion Models: a Two-Stage Approach to Improving Conditional Human Image Generation
arxiv(2024)
摘要
Recent years have seen significant progress in human image generation,
particularly with the advancements in diffusion models. However, existing
diffusion methods encounter challenges when producing consistent hand anatomy
and the generated images often lack precise control over the hand pose. To
address this limitation, we introduce a novel approach to pose-conditioned
human image generation, dividing the process into two stages: hand generation
and subsequent body out-painting around the hands. We propose training the hand
generator in a multi-task setting to produce both hand images and their
corresponding segmentation masks, and employ the trained model in the first
stage of generation. An adapted ControlNet model is then used in the second
stage to outpaint the body around the generated hands, producing the final
result. A novel blending technique is introduced to preserve the hand details
during the second stage that combines the results of both stages in a coherent
way. This involves sequential expansion of the out-painted region while fusing
the latent representations, to ensure a seamless and cohesive synthesis of the
final image. Experimental evaluations demonstrate the superiority of our
proposed method over state-of-the-art techniques, in both pose accuracy and
image quality, as validated on the HaGRID dataset. Our approach not only
enhances the quality of the generated hands but also offers improved control
over hand pose, advancing the capabilities of pose-conditioned human image
generation. The source code of the proposed approach is available at
https://github.com/apelykh/hand-to-diffusion.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要