Policy gradient approaches for multi-objective sequential decision making: A comparison

Neural Networks(2014)

引用 23|浏览76
暂无评分
摘要
This paper investigates the use of policy gradient techniques to approximate the Pareto frontier in Multi-Objective Markov Decision Processes (MOMDPs). Despite the popularity of policy-gradient algorithms and the fact that gradient-ascent algorithms have been already proposed to numerically solve multi-objective optimization problems, especially in combination with multi-objective evolutionary algorithms, so far little attention has been paid to the use of gradient information to face multi-objective sequential decision problems. Three different Multi-Objective Reinforcement-Learning (MORL) approaches are here presented. The first two, called radial and Pareto following, start from an initial policy and perform gradient-based policy-search procedures aimed at finding a set of non-dominated policies. Differently, the third approach performs a single gradient-ascent run that, at each step, generates an improved continuous approximation of the Pareto frontier. The parameters of a function that defines a manifold in the policy parameter space are updated following the gradient of some performance criterion so that the sequence of candidate solutions gets as close as possible to the Pareto front. Besides reviewing the three different approaches and discussing their main properties, we empirically compare them with other MORL algorithms on two interesting MOMDPs.
更多
查看译文
关键词
Pareto optimisation,approximation theory,decision making,evolutionary computation,gradient methods,learning (artificial intelligence),MOMDPs,MORL approaches,Pareto following,Pareto frontier approximation,gradient-ascent algorithms,gradient-based policy-search procedures,multiobjective Markov decision processes,multiobjective evolutionary algorithms,multiobjective optimization problems,multiobjective reinforcement-learning approaches,multiobjective sequential decision making,nondominated policies,performance criterion,policy gradient approaches,policy-gradient algorithms,radial following
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要