Boosting Trust Region Policy Optimization by Normalizing Flows Policy
arXiv: Artificial Intelligence, Volume abs/1809.10326, 2018.
We propose to improve trust region policy search with normalizing flows policy. We illustrate that when the trust region is constructed by KL divergence constraint, normalizing flows policy can generate samples far from the u0027centeru0027 of the previous policy iterate, which potentially enables better exploration and helps avoid bad lo...More
PPT (Upload PPT)