Recipe 2 Image : Multimodal High-Resolution Text to Image Synthesis using Stacked Generative Adversarial Network

semanticscholar（2018）

引用 0|浏览0

暂无评分

摘要

Recently the dataset recipe1M, containing 800K pairs of recipes and their corresponding images, was published as part of [16]. In addition, recent works [18, 21, 22, 14] have shown that Conditional Generative Adversarial Networks (cGAN) [4, 10] are effective in synthesizing highresolution images from text descriptions. In these works the image generation is conditioned on a single and simple sentence that describes the visual content of the image. In this paper we tackle the novel task of generating highresolution 256 × 256 images of food conditioned on their recipes. As opposed to previous work, the relation between the recipe text to the visual content of the image is vague, and the textual structure of recipes is complex, consisting of two sections (ingredients and instructions) both containing multiple sentences. We used the recipe1M dataset to train and evaluate our model that is based on a the StackGAN-v2 architecture [21]. In addition, as a condition to our cGAN, we leveraged two types of recipe embeddings introduced in [16] the first is based on cosine-similarity only, and the second combines it with a high-level classification objective regularization. To the best of our knowledge, this is the first attempt to use an embedding method that is based on a high-level classification objective, to the purpose of generating images from text.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要