From Large to Tiny: Distilling and Refining Mathematical Expertise for Math Word Problems with Weakly Supervision
arxiv(2024)
摘要
Addressing the challenge of high annotation costs in solving Math Word
Problems (MWPs) through full supervision with intermediate equations, recent
works have proposed weakly supervised task settings that rely solely on the
final answer as a supervised signal. Existing leading approaches typically
employ various search techniques to infer intermediate equations, but cannot
ensure their semantic consistency with natural language descriptions. The rise
of Large Language Models (LLMs) like ChatGPT has opened up new possibilities
for addressing MWPs directly. However, the computational demands of LLMs make
them less than ideal for use in settings where resources are tight. In light of
these challenges, we introduce an innovative two-stage framework that adeptly
transfers mathematical Expertise from large to tiny language models. In
Distillation Stage, we propose a series of extraction processes that
satisfy the properties of MWPs to distill mathematical knowledge from LLMs to
construct problem-equation pairs required for supervised training. In
Refinement Stage, Due to Knowledge distilling method cannot guarantee
the full utilization of all data, we further utilize the unsuccessfully
searched data effectively by Knowledge Refine method. Finally, We train a small
model using distilled data generated through two-stage methods. As our method
fully leverages the semantic understanding capabilities during the searching
'problem-equation' pair, it demonstrates significantly improved performance on
the Math23K and Weak12K datasets compared to existing small model methods,
while maintaining a much lower computational cost than ChatGPT.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要