Recipe2Video: Synthesizing Personalized Videos from Recipe Texts

Prateksha Udhayanan, B. Suryateja,Parth Laturia, Dev Chauhan, Darshan Khandelwal,Stefano Petrangeli,Balaji Vasan Srinivasan

WACV(2023)

引用 0|浏览31
暂无评分
摘要
Procedural texts are a special type of documents that contain complex textual descriptions for carrying out a sequence of instructions. Due to the lack of visual cues, it often becomes difficult to consume the textual information effectively. In this paper, we focus on recipes - a particular type of procedural document and introduce a novel deep-learning driven system - Recipe2Video that automatically converts a recipe document into a multimodal illustrative video. Our method employs novel retrieval and re-ranking methods to select the best set of images and videos that can provide the desired illustration. We formulate a Viterbi-based optimization algorithm to stitch together a coherent video that combines the visual cues, text and voice-over to present an enhanced mode of consumption. We design automated metrics and compare performance across several baselines on two recipe datasets (RecipeQA, Tasty Videos). Our results on downstream tasks and human studies indicate that Recipe2Video captures the semantic and sequential information of the input in the generated video.
更多
查看译文
关键词
Applications: Visualization,Vision + language and/or other modalities
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要