Small Language Models Need Strong Verifiers to Self-Correct Reasoning
arxiv(2024)
摘要
Self-correction has emerged as a promising solution to boost the reasoning
performance of large language models (LLMs), where LLMs refine their solutions
using self-generated critiques that pinpoint the errors. This work explores
whether smaller-size (<= 13B) language models (LMs) have the ability of
self-correction on reasoning tasks with minimal inputs from stronger LMs. We
propose a novel pipeline that prompts smaller LMs to collect self-correction
data that supports the training of self-refinement abilities. First, we
leverage correct solutions to guide the model in critiquing their incorrect
responses. Second, the generated critiques, after filtering, are used for
supervised fine-tuning of the self-correcting reasoner through solution
refinement. Our experimental results show improved self-correction abilities of
two models on five datasets spanning math and commonsense reasoning, with
notable performance gains when paired with a strong GPT-4-based verifier,
though limitations are identified when using a weak self-verifier for
determining when to correct.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要