Can LLMs Reason with Rules? Logic Scaffolding for Stress-Testing and Improving LLMs
CoRR(2024)
摘要
Large language models (LLMs) have achieved impressive human-like performance
across various reasoning tasks. However, their mastery of underlying
inferential rules still falls short of human capabilities. To investigate this,
we propose a logic scaffolding inferential rule generation framework, to
construct an inferential rule base, ULogic, comprising both primitive and
compositional rules across five domains. Our analysis of GPT-series models over
a rule subset reveals significant gaps in LLMs' logic understanding compared to
human performance, especially in compositional and structural complex rules
with certain bias patterns. We further distill these rules into a smaller-scale
inference engine for flexible rule generation and enhancing downstream
reasoning. Through a multi-judger evaluation, our inference engine proves
effective in generating accurate, complex and abstract conclusions and
premises, and improve various commonsense reasoning tasks. Overall, our work
sheds light on LLMs' limitations in grasping inferential rule and suggests ways
to enhance their logical reasoning abilities [Code and data are
available at .].
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要