Open-domain Visual Entity Linking

ICLR 2023(2023)

引用 0|浏览79
暂无评分
摘要
We introduce the task of Open-domain Visual Entity Linking (OVEN), targeting a wide range of entities including animals, plants, buildings, locations and much more. Given an image (e.g., an image of an aircraft), a text query (`What is the model?' or `What is the airline?'), and a multi-modal knowledge base (e.g., Wikipedia), the goal is to link to an entity (Boeing-777 or EVA Air) out of all entities in the knowledge base. We build a benchmark dataset (OVEN-wiki), by repurposing 14 existing image classification, image retrieval, and visual QA datasets. We link all existing labels to Wikipedia entities when possible, using a state-of-the-art entity linking system and human annotators, creating a diverse and unified label space. OVEN is a rich and challenging task, which requires models to recognize and link visual content to both a small set of seen entities as well as a much larger set of unseen entities (e.g., unseen aircraft models). OVEN also requires models to generalize to previously unseen intents that may require more fine-grained reasoning (`Who manufactured the aircraft in the back?'). We build strong baselines based on state-of-the-art pre-trained models and find that current pre-trained models struggle to address the challenges posed by OVEN. We hope OVEN will inspire next-generation pre-training techniques and pave the way to future knowledge-intensive vision tasks.
更多
查看译文
关键词
Open-domain Visual Entity Linking,Vision and Language
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要