TeaMs-RL: Teaching LLMs to Teach Themselves Better Instructions via Reinforcement Learning
CoRR(2024)
Abstract
The development of Large Language Models (LLMs) often confronts challenges
stemming from the heavy reliance on human annotators in the reinforcement
learning with human feedback (RLHF) framework, or the frequent and costly
external queries tied to the self-instruct paradigm. In this work, we pivot to
Reinforcement Learning (RL) – but with a twist. Diverging from the typical
RLHF, which refines LLMs following instruction data training, we use RL to
directly generate the foundational instruction dataset that alone suffices for
fine-tuning. Our method, TeaMs-RL, uses a suite of textual operations and
rules, prioritizing the diversification of training datasets. It facilitates
the generation of high-quality data without excessive reliance on external
advanced models, paving the way for a single fine-tuning step and negating the
need for subsequent RLHF stages. Our findings highlight key advantages of our
approach: reduced need for human involvement and fewer model queries (only
5.73% of WizardLM's total), along with enhanced capabilities of LLMs in
crafting and comprehending complex instructions compared to strong baselines,
and substantially improved model privacy protection.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined