Episodic Exploration for Deep Deterministic Policies for StarCraft Micromanagement
international conference on learning representations(2017)
摘要
We consider scenarios from the real-time strategy game StarCraft as benchmarks for reinforcement learning algorithms. We focus on micromanagement, that is, the short-term, low-level control of team members during a battle. We propose several scenarios that are challenging for reinforcement learning algorithms because the state- action space is very large, and there is no obvious feature representation for the value functions. We describe our approach to tackle the micromanagement scenarios with deep neural network controllers from raw state features given by the game engine. We also present a heuristic reinforcement learning algorithm which combines direct exploration in the policy space and backpropagation. This algorithm collects traces for learning using deterministic policies, which appears much more efficient than, e.g., e-greedy exploration. Experiments show that this algorithm allows to successfully learn non-trivial strategies for scenarios with armies of up to 15 agents, where both Q-learning and REINFORCE struggle.
更多查看译文
关键词
deep deterministic policies,exploration
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络