Training Value-Aligned Reinforcement Learning Agents Using a Normative Prior

Md Sultan al Nahian,Spencer Frazier,Mark Riedl,Brent Harrison

IEEE Transactions on Artificial Intelligence（2024）

引用 0|浏览3

暂无评分

摘要

Value alignment is a property of intelligent agents wherein they solely pursue non-harmful behaviors or human-beneficial goals. We introduce an approach to value-aligned reinforcement learning, in which we train an agent with two reward signals: a standard task performance reward plus a normative behavior reward. The normative behavior reward is derived from a value-aligned prior model that we train using naturally occurring stories. These stories encode societal norms and can be used to classify text as normative or non-normative. We show how variations on a policy shaping technique can balance these two sources of reward and produce policies that are both effective and perceived as more normative. We test our value-alignment technique on three interactive text-based worlds; each world is designed specifically to challenge agents with a task as well as provide opportunities to deviate from the task to engage in normative and/or altruistic behavior.

查看译文

关键词

Autonomous Agents,Natural language processing,Reinforcement Learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要