Triple-GAIL: A Multi-Modal Imitation Learning Framework with Generative Adversarial Nets

IJCAI, pp. 2929-2935, 2020.

Cited by: 0|Bibtex|Views196|Links
EI
Keywords:
multi modalstrategy gamereal-time strategygenerative adversarial netsskill selectionMore(12+)
Weibo:
We propose a novel multi-modal GAIL framework, named Triple-GAIL, that is able to learn skill selection and imitation jointly from both expert demonstrations and continuously generated experiences with data augmentation purpose by introducing an auxiliary skill selector

Abstract:

Generative adversarial imitation learning (GAIL) has shown promising results by taking advantage of generative adversarial nets, especially in the field of robot learning. However, the requirement of isolated single modal demonstrations limits the scalability of the approach to real world scenarios such as autonomous vehicles' demand fo...More

Code:

Data:

0
Introduction
  • Imitation learning aims to mimic expert behavior directly from human demonstrations, without designing explicit reward signal as reinforcement learning (RL) [1], [2], and has made achievements in a variety of tasks.
  • Recent work in imitation learning, especially generative adversarial imitation learning (GAIL) [3], optimizes a policy directly from expert demonstrations without estimating the corresponding reward function, and overcomes compounding errors caused by behavioral cloning (BC) [4] as well as reduces the computational burden of inverse reinforcement learning (IRL) [5], [6].
Highlights
  • Imitation learning aims to mimic expert behavior directly from human demonstrations, without designing explicit reward signal as reinforcement learning (RL) [1], [2], and has made achievements in a variety of tasks
  • Existing imitation learning methods, including generative adversarial imitation learning, mostly focus on reconstructing expert behavior based on the assumption of single modality
  • Generative adversarial imitation learning is a promising imitation learning method based on generative adversarial nets (GANs) [13]
  • In generative adversarial imitation learning, the generator serves as a policy to imitate expert behavior by matching the state-action (s, a) distribution of demonstrations, while the discriminator plays a role of surrogate reward to measure the similarity between the generated data and demonstration data
  • We propose Triple-generative adversarial imitation learning, a novel multimodal generative adversarial imitation learning framework that is able to learn skill selection and imitation jointly from both expert demonstrations and continuously generated experiences by introducing an auxiliary selector
  • The win rates of these two targeted instructions reach up to 90% while less than 60% if the agents run against the mismatched built-in agents, as shown in Table III
  • Experiments on driving task and realtime strategy game demonstrate that Triple-generative adversarial imitation learning can better fit multi-modal behaviors close to the demonstrators and outperforms state-of-the-art methods
Methods
  • Suppose the authors can get a mixed set of labeled demonstrations with multiple expert modalities.
  • In this paper the authors propose to learn one policy simultaneously from multiple expert demonstrations.
  • The expert policy including multiple skill labels is presented as πE = {πE1 , ..., πEk }, which is determined by p(π|c), where c is the skill label.
  • Generator observations adaptively instead of specifying manually and reconstruct multi-modal policy simultaneously , a novel adversarial imitation framework is introduced as follows
Results
  • 3) Ablation Study: To further estimate the performance of the selector and the joint optimization of the selector and the generator, the selection accuracies of driving skills is compared in Table II.
  • The authors show that both CGAIL and TripleGAIL have high selection accuracies, while Triple-GAIL is slightly higher up to 90%.
  • The win rates of these two targeted instructions reach up to 90% while less than 60% if the agents run against the mismatched built-in agents, as shown in Table III
Conclusion
  • The authors propose Triple-GAIL, a novel multimodal GAIL framework that is able to learn skill selection and imitation jointly from both expert demonstrations and continuously generated experiences by introducing an auxiliary selector.
  • The authors provide theoretical guarantees on the convergence to optima for both of the generator and the selector respectively.
  • Experiments on driving task and realtime strategy game demonstrate that Triple-GAIL can better fit multi-modal behaviors close to the demonstrators and outperforms state-of-the-art methods
Summary
  • Introduction:

    Imitation learning aims to mimic expert behavior directly from human demonstrations, without designing explicit reward signal as reinforcement learning (RL) [1], [2], and has made achievements in a variety of tasks.
  • Recent work in imitation learning, especially generative adversarial imitation learning (GAIL) [3], optimizes a policy directly from expert demonstrations without estimating the corresponding reward function, and overcomes compounding errors caused by behavioral cloning (BC) [4] as well as reduces the computational burden of inverse reinforcement learning (IRL) [5], [6].
  • Methods:

    Suppose the authors can get a mixed set of labeled demonstrations with multiple expert modalities.
  • In this paper the authors propose to learn one policy simultaneously from multiple expert demonstrations.
  • The expert policy including multiple skill labels is presented as πE = {πE1 , ..., πEk }, which is determined by p(π|c), where c is the skill label.
  • Generator observations adaptively instead of specifying manually and reconstruct multi-modal policy simultaneously , a novel adversarial imitation framework is introduced as follows
  • Results:

    3) Ablation Study: To further estimate the performance of the selector and the joint optimization of the selector and the generator, the selection accuracies of driving skills is compared in Table II.
  • The authors show that both CGAIL and TripleGAIL have high selection accuracies, while Triple-GAIL is slightly higher up to 90%.
  • The win rates of these two targeted instructions reach up to 90% while less than 60% if the agents run against the mismatched built-in agents, as shown in Table III
  • Conclusion:

    The authors propose Triple-GAIL, a novel multimodal GAIL framework that is able to learn skill selection and imitation jointly from both expert demonstrations and continuously generated experiences by introducing an auxiliary selector.
  • The authors provide theoretical guarantees on the convergence to optima for both of the generator and the selector respectively.
  • Experiments on driving task and realtime strategy game demonstrate that Triple-GAIL can better fit multi-modal behaviors close to the demonstrators and outperforms state-of-the-art methods
Tables
  • Table1: The Success Rate, Mean Distance and KL Divergence of different algorithms
  • Table2: Selection accuracies of driving skills. Triple-GAIL\RE removes supervised loss RE while Triple-GAIL\RG removes supervised loss RG
  • Table3: Win rates of different algorithms competing with builtin agents over 10k games. +label denotes that the inferred labels are replaced by true labels. Matched denotes the agents run against the targeted built-in agents while Mismatched denotes the agents run against the mismatched built-in agents
Download tables as Excel
Reference
  • P. Abbeel and A. Y. Ng, “Apprenticeship learning via inverse reinforcement learning,” in Proceedings of the twenty-first international conference on Machine learning, 2004, pp. 1–8.
    Google ScholarLocate open access versionFindings
  • B. Fang, S. Jia, D. Guo, M. Xu, S. Wen, and F. Sun, “Survey of imitation learning for robotic manipulation,” International Journal of Intelligent Robotics and Applications, pp. 1–8, 2019.
    Google ScholarLocate open access versionFindings
  • J. Ho and S. Ermon, “Generative adversarial imitation learning,” in Advances in neural information processing systems, 2016, pp. 4565– 4573.
    Google ScholarFindings
  • S. Reddy, A. D. Dragan, and S. Levine, “Sqil: imitation learning via regularized behavioral cloning,” arXiv preprint arXiv:1905.11108, 2019.
    Findings
  • M. Wulfmeier, P. Ondruska, and I. Posner, “Maximum entropy deep inverse reinforcement learning,” arXiv preprint arXiv:1507.04888, 2015.
    Findings
  • M. Pflueger, A. Agha, and G. S. Sukhatme, “Rover-irl: Inverse reinforcement learning with soft value iteration networks for planetary rover path planning,” IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 1387–1394, 2019.
    Google ScholarLocate open access versionFindings
  • Y. Li, J. Song, and S. Ermon, “Infogail: Interpretable imitation learning from visual demonstrations,” in Advances in Neural Information Processing Systems, 2017, pp. 3812–3822.
    Google ScholarLocate open access versionFindings
  • A. Kuefler and M. J. Kochenderfer, “Burn-in demonstrations for multimodal imitation learning,” in Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, 2018, pp. 1071–1078.
    Google ScholarLocate open access versionFindings
  • Z. Wang, J. S. Merel, S. E. Reed, N. de Freitas, G. Wayne, and N. Heess, “Robust imitation of diverse behaviors,” in Advances in Neural Information Processing Systems, 2017, pp. 5320–5329.
    Google ScholarLocate open access versionFindings
  • J. Merel, Y. Tassa, S. Srinivasan, J. Lemmon, Z. Wang, G. Wayne, and N. Heess, “Learning human behaviors from motion capture by adversarial imitation,” arXiv preprint arXiv:1707.02201, 2017.
    Findings
  • J. Lin and Z. Zhang, “Acgail: Imitation learning about multiple intentions with auxiliary classifier gans,” in Pacific Rim International Conference on Artificial Intelligence. Springer, 2018, pp. 321–334.
    Google ScholarLocate open access versionFindings
  • L. Chongxuan, T. Xu, J. Zhu, and B. Zhang, “Triple generative adversarial nets,” in Advances in neural information processing systems, 2017, pp. 4088–4098.
    Google ScholarFindings
  • I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, 2014, pp. 2672– 2680.
    Google ScholarLocate open access versionFindings
  • J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, “Trust region policy optimization,” in International conference on machine learning, 2015, pp. 1889–1897.
    Google ScholarLocate open access versionFindings
  • M. Henaff, A. Canziani, and Y. LeCun, “Model-predictive policy learning with uncertainty regularization for driving in dense traffic,” arXiv preprint arXiv:1901.02705, 2019.
    Findings
  • J. Halkias and J. Colyar, “Next generation simulation fact sheet,” US Department of Transportation: Federal Highway Administration, 2006.
    Google ScholarFindings
  • Y. Tian, Q. Gong, W. Shang, Y. Wu, and C. L. Zitnick, “Elf: An extensive, lightweight and flexible research platform for realtime strategy games,” in Advances in Neural Information Processing Systems, 2017, pp. 2659–2669.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments