On the Stability of Iterative Retraining of Generative Models on their own Data
ICLR 2024(2023)
摘要
Deep generative models have made tremendous progress in modeling complex
data, often exhibiting generation quality that surpasses a typical human's
ability to discern the authenticity of samples. Undeniably, a key driver of
this success is enabled by the massive amounts of web-scale data consumed by
these models. Due to these models' striking performance and ease of
availability, the web will inevitably be increasingly populated with synthetic
content. Such a fact directly implies that future iterations of generative
models must contend with the reality that their training is curated from both
clean data and artificially generated data from past models. In this paper, we
develop a framework to rigorously study the impact of training generative
models on mixed datasets (of real and synthetic data) on their stability. We
first prove the stability of iterative training under the condition that the
initial generative models approximate the data distribution well enough and the
proportion of clean training data (w.r.t. synthetic data) is large enough. We
empirically validate our theory on both synthetic and natural images by
iteratively training normalizing flows and state-of-the-art diffusion models on
CIFAR10 and FFHQ.
更多查看译文
关键词
Generative Models,Iterative Training,Diffusion
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要