MMCode: Evaluating Multi-Modal Code Large Language Models with Visually Rich Programming Problems
arxiv(2024)
摘要
Programming often involves converting detailed and complex specifications
into code, a process during which developers typically utilize visual aids to
more effectively convey concepts. While recent developments in Large Multimodal
Models have demonstrated remarkable abilities in visual reasoning and
mathematical tasks, there is little work on investigating whether these models
can effectively interpret visual elements for code generation. To this end, we
present MMCode, the first multi-modal coding dataset for evaluating algorithmic
problem-solving skills in visually rich contexts. MMCode contains 3,548
questions and 6,620 images collected from real-world programming challenges
harvested from 10 code competition websites, presenting significant challenges
due to the extreme demand for reasoning abilities. Our experiment results show
that current state-of-the-art models struggle to solve these problems. The
results highlight the lack of powerful vision-code models, and we hope MMCode
can serve as an inspiration for future works in this domain. The data and code
are publicly available at https://github.com/happylkx/MMCode.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要