MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

Xiangxiang Chu, Limeng Qiao,Xinyu Zhang, Shuang Xu, Fei Wei,Yang Yang,Xiaofei Sun, Yiming Hu, Xinyang Lin,Bo Zhang,Chunhua Shen

CoRR(2024)

引用 0|浏览24
暂无评分
摘要
We introduce MobileVLM V2, a family of significantly improved vision language models upon MobileVLM, which proves that a delicate orchestration of novel architectural design, an improved training scheme tailored for mobile VLMs, and rich high-quality dataset curation can substantially benefit VLMs' performance. Specifically, MobileVLM V2 1.7B achieves better or on-par performance on standard VLM benchmarks compared with much larger VLMs at the 3B scale. Notably, our 3B model outperforms a large variety of VLMs at the 7B+ scale. Our models will be released at https://github.com/Meituan-AutoML/MobileVLM .
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要