Explaining Agent Preferences and Behavior: Integrating Reward Decomposition and Contrastive Highlights

AAMAS '23: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems（2023）

引用 0|浏览0

暂无评分

摘要

Explainable reinforcement learning methods aim to help elucidate agent policies and their underlying decision-making processes. One such method is reward decomposition, which aims to reveal an agent's preferences in a specific world-state by presenting its expected utility decomposed to different components of the reward function. While this approach quantifies the expected decomposed rewards for alternative actions, it does not demonstrate the outcomes of these alternative actions in terms of the behavior of the agent. This work introduces ''Contrastive Highlights'', a novel local explanation method that visually compares the agent's chosen behavior to an alternative choice of action in a contrastive manner. We conducted user studies comparing participants' understanding of agents' preferences based on either reward decomposition, contrastive highlights, or a combination of both approaches. Our results show that integrating reward decomposition with contrastive highlights significantly improved participants' performance compared to using each of the approaches separately.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要