WeChat Mini Program
Old Version Features

Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with Human Intentions

Annual Meeting of the Association for Computational Linguistics(2024)

University of Chinese Academy of Sciences (UCAS Beijing Academy of Artificial Intelligence

Cited 3|Views39
Abstract
Visual grounding (VG) aims at locating the foreground entities that match thegiven natural language expression. Previous datasets and methods for classic VGtask mainly rely on the prior assumption that the given expression mustliterally refer to the target object, which greatly impedes the practicaldeployment of agents in real-world scenarios. Since users usually prefer toprovide the intention-based expressions for the desired object instead ofcovering all the details, it is necessary for the agents to interpret theintention-driven instructions. Thus, in this work, we take a step further tothe intention-driven visual-language (V-L) understanding. To promote classic VGtowards human intention interpretation, we propose a new intention-drivenvisual grounding (IVG) task and build a largest-scale IVG dataset namedIntentionVG with free-form intention expressions. Considering that practicalagents need to move and find specific targets among various scenarios torealize the grounding task, our IVG task and IntentionVG dataset have taken thecrucial properties of both multi-scenario perception and egocentric view intoconsideration. Besides, various types of models are set up as the baselines torealize our IVG task. Extensive experiments on our IntentionVG dataset andbaselines demonstrate the necessity and efficacy of our method for the V-Lfield. To foster future research in this direction, our newly built dataset andbaselines will be publicly available.
More
Translated text
Key words
Data Visualization,Natural Language Generation
PDF
Bibtex
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper
Summary is being generated by the instructions you defined