What Would Jiminy Cricket Do? Towards Agents That Behave Morally.

Annual Conference on Neural Information Processing Systems(2021)

引用 33|浏览54
暂无评分
摘要
When making everyday decisions, people are guided by their conscience, an internal sense of right and wrong, to behave morally. By contrast, artificial agents may behave immorally when trained on environments that ignore moral concerns, such as violent video games. With the advent of generally capable agents that pretrain on many environments, mitigating inherited biases towards immoral behavior will become necessary. However, prior work on aligning agents with human values and morals focuses on small-scale settings lacking in semantic complexity. To enable research in larger, more realistic settings, we introduce Jiminy Cricket, an environment suite of 25 text-based adventure games with thousands of semantically rich, morally salient scenarios. Via dense annotations for every possible action, Jiminy Cricket environments robustly evaluate whether agents can act morally while maximizing reward. To improve moral behavior, we leverage language models with commonsense moral knowledge and develop strategies to mediate this knowledge into actions. In extensive experiments, we find that our artificial conscience approach can steer agents towards moral behavior without sacrificing performance.
更多
查看译文
关键词
jiminy cricket,agents
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要