Monte-Carlo Tree Search as Regularized Policy Optimization

ICML, pp. 3769-3778, 2020.

Cited by: 1|Views91
EI
Weibo:
We showed that the action selection formula used in Monte-Carlo tree search algorithms, most notably AlphaZero, approximates the solution to a regularized policy optimization problem formulated with search Q-values

Abstract:

The combination of Monte-Carlo tree search (MCTS) with deep reinforcement learning has led to significant advances in artificial intelligence. However, AlphaZero, the current state-of-the-art MCTS algorithm, still relies on handcrafted heuristics that are only partially understood. In this paper, we show that AlphaZero's search heuristi...More

Code:

Data:

0
Your rating :
0

 

Tags
Comments