Visual Madlibs: Fill in the Blank Description Generation and Question Answering

ICCV(2015)

引用 167|浏览73
暂无评分
摘要
In this paper, we introduce a new dataset consisting of 360,001 focused natural language descriptions for 10,738 images. This dataset, the Visual Madlibs dataset, is collected using automatically produced fill-in-the-blank templates designed to gather targeted descriptions about: people and objects, their appearances, activities, and interactions, as well as inferences about the general scene or its broader context. We provide several analyses of the Visual Madlibs dataset and demonstrate its applicability to two new description generation tasks: focused description generation, and multiple-choice question-answering for images. Experiments using joint-embedding and deep learning methods show promising results on these tasks.
更多
查看译文
关键词
fill in the blank description generation,natural language descriptions,visual Madlibs dataset,fill-in-the-blank templates,targeted descriptions,description generation tasks,multiple-choice question-answering
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要