Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback
CoRR(2024)
摘要
Reinforcement Learning with Human Feedback (RLHF) has received significant
attention for performing tasks without the need for costly manual reward design
by aligning human preferences. It is crucial to consider diverse human feedback
types and various learning methods in different environments. However,
quantifying progress in RLHF with diverse feedback is challenging due to the
lack of standardized annotation platforms and widely used unified benchmarks.
To bridge this gap, we introduce Uni-RLHF, a comprehensive system
implementation tailored for RLHF. It aims to provide a complete workflow from
real human feedback, fostering progress in the development of practical
problems. Uni-RLHF contains three packages: 1) a universal multi-feedback
annotation platform, 2) large-scale crowdsourced feedback datasets, and 3)
modular offline RLHF baseline implementations. Uni-RLHF develops a
user-friendly annotation interface tailored to various feedback types,
compatible with a wide range of mainstream RL environments. We then establish a
systematic pipeline of crowdsourced annotations, resulting in large-scale
annotated datasets comprising more than 15 million steps across 30+ popular
tasks. Through extensive experiments, the results in the collected datasets
demonstrate competitive performance compared to those from well-designed manual
rewards. We evaluate various design choices and offer insights into their
strengths and potential areas of improvement. We wish to build valuable
open-source platforms, datasets, and baselines to facilitate the development of
more robust and reliable RLHF solutions based on realistic human feedback. The
website is available at https://uni-rlhf.github.io/.
更多查看译文
关键词
RLHF,Diverse Human Feedback,Reinforcement Learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要