Multi-Agent VQA: Exploring Multi-Agent Foundation Models in Zero-Shot Visual Question Answering
arxiv(2024)
摘要
This work explores the zero-shot capabilities of foundation models in Visual
Question Answering (VQA) tasks. We propose an adaptive multi-agent system,
named Multi-Agent VQA, to overcome the limitations of foundation models in
object detection and counting by using specialized agents as tools. Unlike
existing approaches, our study focuses on the system's performance without
fine-tuning it on specific VQA datasets, making it more practical and robust in
the open world. We present preliminary experimental results under zero-shot
scenarios and highlight some failure cases, offering new directions for future
research.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要