Distilling Algorithmic Reasoning from LLMs via Explaining Solution Programs
arxiv(2024)
摘要
Distilling explicit chain-of-thought reasoning paths has emerged as an
effective method for improving the reasoning abilities of large language models
(LLMs) across various tasks. However, when tackling complex tasks that pose
significant challenges for state-of-the-art models, this technique often
struggles to produce effective chains of thought that lead to correct answers.
In this work, we propose a novel approach to distill reasoning abilities from
LLMs by leveraging their capacity to explain solutions. We apply our method to
solving competitive-level programming challenges. More specifically, we employ
an LLM to generate explanations for a set of pairs,
then use pairs to fine-tune a smaller language model,
which we refer to as the Reasoner, to learn algorithmic reasoning that can
generate "how-to-solve" hints for unseen problems. Our experiments demonstrate
that learning from explanations enables the Reasoner to more effectively guide
program implementation by a Coder, resulting in higher solve rates than strong
chain-of-thought baselines on competitive-level programming problems. It also
outperforms models that learn directly from pairs.
We curated an additional test set in the CodeContests format, which includes
246 more recent problems posted after the models' knowledge cutoff.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要