Feedback-Generation for Programming Exercises With GPT-4
arxiv(2024)
摘要
Ever since Large Language Models (LLMs) and related applications have become
broadly available, several studies investigated their potential for assisting
educators and supporting students in higher education. LLMs such as Codex,
GPT-3.5, and GPT 4 have shown promising results in the context of large
programming courses, where students can benefit from feedback and hints if
provided timely and at scale. This paper explores the quality of GPT-4 Turbo's
generated output for prompts containing both the programming task specification
and a student's submission as input. Two assignments from an introductory
programming course were selected, and GPT-4 was asked to generate feedback for
55 randomly chosen, authentic student programming submissions. The output was
qualitatively analyzed regarding correctness, personalization, fault
localization, and other features identified in the material. Compared to prior
work and analyses of GPT-3.5, GPT-4 Turbo shows notable improvements. For
example, the output is more structured and consistent. GPT-4 Turbo can also
accurately identify invalid casing in student programs' output. In some cases,
the feedback also includes the output of the student program. At the same time,
inconsistent feedback was noted such as stating that the submission is correct
but an error needs to be fixed. The present work increases our understanding of
LLMs' potential, limitations, and how to integrate them into e-assessment
systems, pedagogical scenarios, and instructing students who are using
applications based on GPT-4.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要