CookGALIP: Recipe Controllable Generative Adversarial CLIPs With Sequential Ingredient Prompts for Food Image Generation

Mengling Xu, Jie Wang,Ming Tao,Bing-Kun Bao,Changsheng Xu

IEEE Transactions on Multimedia(2024)

引用 0|浏览0
暂无评分
摘要
Generating food images from recipes is a challenging task in food analysis, as recipes contain lengthy texts far beyond the semantic information in food images, making it difficult to align the features of two modalities. Existing studies usually concatenate the representations of ingredients and cooking instructions directly, and use the concatenated representations to generate food images through generative adversarial networks (GANs). However, previous models generally ignore the sequential information contained in complicated procedural instructions, which leads to semantic inconsistency between recipes and generated food images. Furthermore, it is still difficult for current models to distinguish and control fine-grained features, causing the entangled ingredient features in food images. To this end, we propose CookGALIP, which strengthens semantic consistency and controllability for food image generation. Based on the recently proposed text-to-image framework GALIP, two modules are specially designed: 1) To incorporate the sequential relationships into the food image generation process, we propose a Recipe Fusion Module (RFM) to fuse the semantics of cooking instructions, so as to balance the semantic complexity between modalities and improve the semantic consistency of recipes and generated food images. 2) To distinguish and control the fine-grained ingredient features, we introduce the Ingredient Control Module (ICM) to generate sequential ingredient prompts, which enables more refined control over the recipe-to-food synthesis process. Experimental results on Recipe1M and Vireo Food-172 datasets show that the proposed model outperforms the state-of-the-art methods.
更多
查看译文
关键词
Recipe-to-food image generation,Prompt learning,Generative adversarial network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要