Generating, Reconstructing, and Representing Discrete and Continuous Data: Generalized Diffusion with Learnable Encoding-Decoding
CoRR(2024)
摘要
The vast applications of deep generative models are anchored in three core
capabilities – generating new instances, reconstructing inputs, and learning
compact representations – across various data types, such as discrete
text/protein sequences and continuous images. Existing model families, like
Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs),
autoregressive models, and diffusion models, generally excel in specific
capabilities and data types but fall short in others. We introduce generalized
diffusion with learnable encoder-decoder (DiLED), that seamlessly integrates
the core capabilities for broad applicability and enhanced performance. DiLED
generalizes the Gaussian noising-denoising in standard diffusion by introducing
parameterized encoding-decoding. Crucially, DiLED is compatible with the
well-established diffusion model objective and training recipes, allowing
effective learning of the encoder-decoder parameters jointly with diffusion. By
choosing appropriate encoder/decoder (e.g., large language models), DiLED
naturally applies to different data types. Extensive experiments on text,
proteins, and images demonstrate DiLED's flexibility to handle diverse data and
tasks and its strong improvement over various existing models.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要