Ensemble Policy Distillation with Reduced Data Distribution Mismatch

2022 International Joint Conference on Neural Networks (IJCNN)(2022)

引用 1|浏览3
暂无评分
摘要
Policy distillation is a method for model compression for deep reinforcement learning, which is typically applied onto mobile devices to reduce power consumption and inference time. However, achieving full and stable distilled policies is challenging, which impedes higher compression ratios. In this work, we develop two policy distillation algorithms to address this problem. Our first algorithm, Ensemble Policy Distillation (EPD), incorporates the idea from supervised learning distillation that uses an ensemble of teacher networks to provide diverse supervision for a compact student policy network. In the Deep Q-Network (DQN) framework, our experiments verify that highly compressed student networks distilled using EPD even outperform teachers for numerous Atari games. Additionally, we analyze how the issue of data distribution mismatch caused by the teacher ensemble in EPD negatively impacts teachers' learning, and introduce the second algorithm, Double Policy Distillation (DPD), as a novel method to mitigate the distribution mismatch. Empirical results show that DPD improves both the teachers' learning and the student's distillation in Atari games and continuous control tasks.
更多
查看译文
关键词
reinforcement learning,ensemble learning,distribution mismatch,policy distillation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要