Deep Conservative Policy Iteration

CoRR, 2019.

Cited by: 0|Views6
EI

Abstract:

Conservative Policy Iteration (CPI) is a founding algorithm of Approximate Dynamic Programming (ADP). Its core principle is to stabilize greediness through stochastic mixtures of consecutive policies. It comes with strong theoretical guarantees, and inspired approaches in deep Reinforcement Learning (RL). However, CPI itself has rarely ...More

Code:

Data:

Full Text
Bibtex
Your rating :
0

 

Tags
Comments