Large Language Model Augmented Exercise Retrieval for Personalized Language Learning
CoRR(2024)
摘要
We study the problem of zero-shot exercise retrieval in the context of online
language learning, to give learners the ability to explicitly request
personalized exercises via natural language. Using real-world data collected
from language learners, we observe that vector similarity approaches poorly
capture the relationship between exercise content and the language that
learners use to express what they want to learn. This semantic gap between
queries and content dramatically reduces the effectiveness of general-purpose
retrieval models pretrained on large scale information retrieval datasets like
MS MARCO. We leverage the generative capabilities of large language models to
bridge the gap by synthesizing hypothetical exercises based on the learner's
input, which are then used to search for relevant exercises. Our approach,
which we call mHyER, overcomes three challenges: (1) lack of relevance labels
for training, (2) unrestricted learner input content, and (3) low semantic
similarity between input and retrieval candidates. mHyER outperforms several
strong baselines on two novel benchmarks created from crowdsourced data and
publicly available data.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要