PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models
CoRR(2024)
摘要
This technical report introduces PIXART-δ, a text-to-image synthesis
framework that integrates the Latent Consistency Model (LCM) and ControlNet
into the advanced PIXART-α model. PIXART-α is recognized for its
ability to generate high-quality images of 1024px resolution through a
remarkably efficient training process. The integration of LCM in
PIXART-δ significantly accelerates the inference speed, enabling the
production of high-quality images in just 2-4 steps. Notably, PIXART-δ
achieves a breakthrough 0.5 seconds for generating 1024x1024 pixel images,
marking a 7x improvement over the PIXART-α. Additionally,
PIXART-δ is designed to be efficiently trainable on 32GB V100 GPUs
within a single day. With its 8-bit inference capability (von Platen et al.,
2023), PIXART-δ can synthesize 1024px images within 8GB GPU memory
constraints, greatly enhancing its usability and accessibility. Furthermore,
incorporating a ControlNet-like module enables fine-grained control over
text-to-image diffusion models. We introduce a novel ControlNet-Transformer
architecture, specifically tailored for Transformers, achieving explicit
controllability alongside high-quality image generation. As a state-of-the-art,
open-source image generation model, PIXART-δ offers a promising
alternative to the Stable Diffusion family of models, contributing
significantly to text-to-image synthesis.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要