Probing Commonsense Reasoning Capability of Text-to-Image Generative Models via Non-visual Description
CoRR(2023)
摘要
Commonsense reasoning, the ability to make logical assumptions about daily
scenes, is one core intelligence of human beings. In this work, we present a
novel task and dataset for evaluating the ability of text-to-image generative
models to conduct commonsense reasoning, which we call PAINTaboo. Given a
description with few visual clues of one object, the goal is to generate images
illustrating the object correctly. The dataset was carefully hand-curated and
covered diverse object categories to analyze model performance comprehensively.
Our investigation of several prevalent text-to-image generative models reveals
that these models are not proficient in commonsense reasoning, as anticipated.
We trust that PAINTaboo can improve our understanding of the reasoning
abilities of text-to-image generative models.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要