Rating-Based Reinforcement Learning

Devin White, Mingkang Wu,Ellen Novoseller, Vernon J. Lawhern,Nicholas Waytowich,Yongcan Cao

AAAI 2024(2024)

引用 0|浏览9
暂无评分
摘要
This paper develops a novel rating-based reinforcement learning approach that uses human ratings to obtain human guidance in reinforcement learning. Different from the existing preference-based and ranking-based reinforcement learning paradigms, based on human relative preferences over sample pairs, the proposed rating-based reinforcement learning approach is based on human evaluation of individual trajectories without relative comparisons between sample pairs. The rating-based reinforcement learning approach builds on a new prediction model for human ratings and a novel multi-class loss function. We conduct several experimental studies based on synthetic ratings and real human ratings to evaluate the effectiveness and benefits of the new rating-based reinforcement learning approach.
更多
查看译文
关键词
HAI: Learning Human Values and Preferences,HAI: Human-in-the-loop Machine Learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要