Pruning Dominated Policies in Multiobjective Pareto Q-Learning.

Lawrence Mandow, José-Luis Pérez-de-la-Cruz

ADVANCES IN ARTIFICIAL INTELLIGENCE, CAEPIA 2018(2018)

引用 2|浏览8
暂无评分
摘要
The solution for a Multi-Objetive Reinforcement Learning problem is a set of Pareto optimal policies. MPQ-learning is a recent algorithm that approximates the whole set of all Pareto-optimal deterministic policies by directly generalizing Q-learning to the multiobjective setting. In this paper we present a modification of MPQ-learning that avoids useless cyclical policies and thus improves the number of training steps required for convergence.
更多
查看译文
关键词
Pareto-optimal Policies, Training Step, Nonstationary Policies, Reward Vector, Multiobjective Reinforcement Learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要