Evaluating and Optimizing Educational Content with Large Language Model Judgments
arxiv(2024)
摘要
Creating effective educational materials generally requires expensive and
time-consuming studies of student learning outcomes. To overcome this barrier,
one idea is to build computational models of student learning and use them to
optimize instructional materials. However, it is difficult to model the
cognitive processes of learning dynamics. We propose an alternative approach
that uses Language Models (LMs) as educational experts to assess the impact of
various instructions on learning outcomes. Specifically, we use GPT-3.5 to
evaluate the overall effect of instructional materials on different student
groups and find that it can replicate well-established educational findings
such as the Expertise Reversal Effect and the Variability Effect. This
demonstrates the potential of LMs as reliable evaluators of educational
content. Building on this insight, we introduce an instruction optimization
approach in which one LM generates instructional materials using the judgments
of another LM as a reward function. We apply this approach to create math word
problem worksheets aimed at maximizing student learning gains. Human teachers'
evaluations of these LM-generated worksheets show a significant alignment
between the LM judgments and human teacher preferences. We conclude by
discussing potential divergences between human and LM opinions and the
resulting pitfalls of automating instructional design.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要