MMTryon: Multi-Modal Multi-Reference Control for High-Quality Fashion Generation
arxiv(2024)
摘要
This paper introduces MMTryon, a multi-modal multi-reference VIrtual Try-ON
(VITON) framework, which can generate high-quality compositional try-on results
by taking as inputs a text instruction and multiple garment images. Our MMTryon
mainly addresses two problems overlooked in prior literature: 1) Support of
multiple try-on items and dressing styleExisting methods are commonly designed
for single-item try-on tasks (e.g., upper/lower garments, dresses) and fall
short on customizing dressing styles (e.g., zipped/unzipped, tuck-in/tuck-out,
etc.) 2) Segmentation Dependency. They further heavily rely on
category-specific segmentation models to identify the replacement regions, with
segmentation errors directly leading to significant artifacts in the try-on
results. For the first issue, our MMTryon introduces a novel multi-modality and
multi-reference attention mechanism to combine the garment information from
reference images and dressing-style information from text instructions.
Besides, to remove the segmentation dependency, MMTryon uses a parsing-free
garment encoder and leverages a novel scalable data generation pipeline to
convert existing VITON datasets to a form that allows MMTryon to be trained
without requiring any explicit segmentation. Extensive experiments on
high-resolution benchmarks and in-the-wild test sets demonstrate MMTryon's
superiority over existing SOTA methods both qualitatively and quantitatively.
Besides, MMTryon's impressive performance on multi-items and style-controllable
virtual try-on scenarios and its ability to try on any outfit in a large
variety of scenarios from any source image, opens up a new avenue for future
investigation in the fashion community.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要