Neural Replicator Dynamics: Multiagent Learning via Hedging Policy Gradients
AAMAS '19: International Conference on Autonomous Agents and Multiagent Systems Auckland New Zealand May, 2020, pp. 492-501, 2020.
Policy gradient and actor-critic algorithms form the basis of many commonly used training techniques in deep reinforcement learning. Using these algorithms in multiagent environments poses problems such as nonstationarity and instability. In this paper, we first demonstrate that standard softmax-based policy gradient can be prone to poor ...More
Full Text (Upload PDF)
PPT (Upload PPT)