Monte-Carlo Tree Search as Regularized Policy Optimization
ICML, pp. 3769-3778, 2020.
We showed that the action selection formula used in Monte-Carlo tree search algorithms, most notably AlphaZero, approximates the solution to a regularized policy optimization problem formulated with search Q-values
The combination of Monte-Carlo tree search (MCTS) with deep reinforcement learning has led to significant advances in artificial intelligence. However, AlphaZero, the current state-of-the-art MCTS algorithm, still relies on handcrafted heuristics that are only partially understood. In this paper, we show that AlphaZero's search heuristi...More
PPT (Upload PPT)