Neural Replicator Dynamics: Multiagent Learning via Hedging Policy Gradients

Dustin Morrill
Dustin Morrill
Audrunas Gruslys
Audrunas Gruslys
Jean-Baptiste Lespiau
Jean-Baptiste Lespiau
Paavo Parmas
Paavo Parmas
Edgar Duèñez-Guzmán
Edgar Duèñez-Guzmán

AAMAS '19: International Conference on Autonomous Agents and Multiagent Systems Auckland New Zealand May, 2020, pp. 492-501, 2020.

Cited by: 0|Bibtex|Views92|Links
EI

Abstract:

Policy gradient and actor-critic algorithms form the basis of many commonly used training techniques in deep reinforcement learning. Using these algorithms in multiagent environments poses problems such as nonstationarity and instability. In this paper, we first demonstrate that standard softmax-based policy gradient can be prone to poor ...More

Code:

Data:

Your rating :
0

 

Tags
Comments