DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control
arxiv(2023)
摘要
Large, pretrained latent diffusion models (LDMs) have demonstrated an
extraordinary ability to generate creative content, specialize to user data
through few-shot fine-tuning, and condition their output on other modalities,
such as semantic maps. However, are they usable as large-scale data generators,
e.g., to improve tasks in the perception stack, like semantic segmentation? We
investigate this question in the context of autonomous driving, and answer it
with a resounding "yes". We propose an efficient data generation pipeline
termed DGInStyle. First, we examine the problem of specializing a pretrained
LDM to semantically-controlled generation within a narrow domain. Second, we
propose a Style Swap technique to endow the rich generative prior with the
learned semantic control. Third, we design a Multi-resolution Latent Fusion
technique to overcome the bias of LDMs towards dominant objects. Using
DGInStyle, we generate a diverse dataset of street scenes, train a
domain-agnostic semantic segmentation model on it, and evaluate the model on
multiple popular autonomous driving datasets. Our approach consistently
increases the performance of several domain generalization methods compared to
the previous state-of-the-art methods. Source code and dataset are available at
https://dginstyle.github.io.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要