Open-domain Visual Entity Linking

Hexiang Hu,Yi Luan,Urvashi Khandelwal,Mandar Joshi,Kenton Lee,Kristina Toutanova,Ming-Wei Chang

ICLR 2023（2023）

引用 0|浏览79

暂无评分

摘要

We introduce the task of Open-domain Visual Entity Linking (OVEN), targeting a wide range of entities including animals, plants, buildings, locations and much more. Given an image (e.g., an image of an aircraft), a text query (`What is the model?' or `What is the airline?'), and a multi-modal knowledge base (e.g., Wikipedia), the goal is to link to an entity (Boeing-777 or EVA Air) out of all entities in the knowledge base. We build a benchmark dataset (OVEN-wiki), by repurposing 14 existing image classification, image retrieval, and visual QA datasets. We link all existing labels to Wikipedia entities when possible, using a state-of-the-art entity linking system and human annotators, creating a diverse and unified label space. OVEN is a rich and challenging task, which requires models to recognize and link visual content to both a small set of seen entities as well as a much larger set of unseen entities (e.g., unseen aircraft models). OVEN also requires models to generalize to previously unseen intents that may require more fine-grained reasoning (`Who manufactured the aircraft in the back?'). We build strong baselines based on state-of-the-art pre-trained models and find that current pre-trained models struggle to address the challenges posed by OVEN. We hope OVEN will inspire next-generation pre-training techniques and pave the way to future knowledge-intensive vision tasks.

查看译文

关键词

Open-domain Visual Entity Linking,Vision and Language

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要