A Survey of Reinforcement Learning Informed by Natural Language

IJCAI, pp. 6309-6317, 2019.

Cited by: 53|Views234
EI
Weibo:
While there is a growing body of papers that incorporate language into Reinforcement Learning, most of the research effort has been focused on simple RL tasks and synthetic languages, with highly structured and instructive text

Abstract:

To be successful in real-world tasks, Reinforcement Learning (RL) needs to exploit the compositional, relational, and hierarchical structure of the world, and learn to transfer it to the task at hand. Recent advances in representation learning for language make it possible to build models that acquire world knowledge from text corpora a...More

Code:

Data:

0
Introduction
  • Whether natural or formal, allow them to encode abstractions, to generalize, to communicate plans, intentions, and requirements, both to other parties and to ourselves [Gopnik and Meltzoff, 1987].
  • Inspired by gaps in the existing literature, the authors advocate the development of new research environments utilizing domain knowledge in natural language, as well as a wider use of NLP methods such as pre-trained language models and parsers to inform RL agents about the structure of the world.
Highlights
  • Languages, whether natural or formal, allow us to encode abstractions, to generalize, to communicate plans, intentions, and requirements, both to other parties and to ourselves [Gopnik and Meltzoff, 1987]
  • Agents trained with traditional approaches within dominant paradigms such as Reinforcement Learning (RL) and Imitation Learning (IL) typically lack such capabilities, and struggle to efficiently learn from interactions with rich and diverse environments
  • We argue that the time has come for natural language to become a first-class citizen of solutions to sequential decision making problems
  • Several trends are evident: (i) studies for language-conditional RL are more numerous than for language-assisted RL, (ii) learning from task-dependent text is more common than learning from taskindependent text, (iii) within work studying transfer from task-dependent text, only a handful of papers study how to use unstructured and descriptive text, (iv) there are only a few papers exploring methods for structuring internal plans and building compositional representations using the structure of language, and (v) natural language, as opposed to synthetically generated languages, is still not the standard in research on instruction following
  • The currently predominant way RL agents are trained restricts their use to environments where all information about the policy can be gathered from directly acting in and receiving reward from the environment. This tabula rasa learning results in low sample efficiency and poor performance when transferring to other environments
  • While there is a growing body of papers that incorporate language into RL, most of the research effort has been focused on simple RL tasks and synthetic languages, with highly structured and instructive text
Results
  • In reviewing efforts that integrate language in RL the authors highlight work that develops tools, approaches, or insights that the authors believe may be valuable for improving the generalization or sample efficiency of learning agents through the use of natural language.
  • [Hu et al, 2019] consider generated natural language as a representation for macro actions in a real-time strategy game environment based on [Tian et al, 2017].
  • Several trends are evident: (i) studies for language-conditional RL are more numerous than for language-assisted RL, (ii) learning from task-dependent text is more common than learning from taskindependent text, (iii) within work studying transfer from task-dependent text, only a handful of papers study how to use unstructured and descriptive text, (iv) there are only a few papers exploring methods for structuring internal plans and building compositional representations using the structure of language, and (v) natural language, as opposed to synthetically generated languages, is still not the standard in research on instruction following.
  • The authors believe that there are several factors that make focusing such efforts worthwhile : (i) recent progress in pre-training language models, (ii) general advances in representation learning, as well as (iii) development of tools that make constructing rich and challenging RL environments easier.
  • Preliminary results that demonstrate zeroshot capabilities [Radford et al, 2019] suggest that a relatively small dataset of instructions or descriptions could suffice to ground and utilize task-dependent information for better sample efficiency and generalization of RL agents.
  • The authors believe that learning representations for transferring knowledge about analogies, going beyond using analogies as auxiliary tasks [Oh et al, 2017] will play an important role in generalizing to unseen instructions.
  • In works studying transfer from descriptive task-dependent language corpora [Narasimhan et al, 2018], natural language sentences could be embedded using representations from pretrained language models.
Conclusion
  • Integrating and fine-tuning pretrained information retrieval and machine reading systems similar to [Chen et al, 2017a] with RL agents that query them could help in extracting and utilizing relevant information from unstructured task-specific language corpora such as game manuals as used in [Branavan et al, 2012].
  • Such research requires development of more challenging environments that reflect the semantics and diversity of the real world
Summary
  • Whether natural or formal, allow them to encode abstractions, to generalize, to communicate plans, intentions, and requirements, both to other parties and to ourselves [Gopnik and Meltzoff, 1987].
  • Inspired by gaps in the existing literature, the authors advocate the development of new research environments utilizing domain knowledge in natural language, as well as a wider use of NLP methods such as pre-trained language models and parsers to inform RL agents about the structure of the world.
  • In reviewing efforts that integrate language in RL the authors highlight work that develops tools, approaches, or insights that the authors believe may be valuable for improving the generalization or sample efficiency of learning agents through the use of natural language.
  • [Hu et al, 2019] consider generated natural language as a representation for macro actions in a real-time strategy game environment based on [Tian et al, 2017].
  • Several trends are evident: (i) studies for language-conditional RL are more numerous than for language-assisted RL, (ii) learning from task-dependent text is more common than learning from taskindependent text, (iii) within work studying transfer from task-dependent text, only a handful of papers study how to use unstructured and descriptive text, (iv) there are only a few papers exploring methods for structuring internal plans and building compositional representations using the structure of language, and (v) natural language, as opposed to synthetically generated languages, is still not the standard in research on instruction following.
  • The authors believe that there are several factors that make focusing such efforts worthwhile : (i) recent progress in pre-training language models, (ii) general advances in representation learning, as well as (iii) development of tools that make constructing rich and challenging RL environments easier.
  • Preliminary results that demonstrate zeroshot capabilities [Radford et al, 2019] suggest that a relatively small dataset of instructions or descriptions could suffice to ground and utilize task-dependent information for better sample efficiency and generalization of RL agents.
  • The authors believe that learning representations for transferring knowledge about analogies, going beyond using analogies as auxiliary tasks [Oh et al, 2017] will play an important role in generalizing to unseen instructions.
  • In works studying transfer from descriptive task-dependent language corpora [Narasimhan et al, 2018], natural language sentences could be embedded using representations from pretrained language models.
  • Integrating and fine-tuning pretrained information retrieval and machine reading systems similar to [Chen et al, 2017a] with RL agents that query them could help in extracting and utilizing relevant information from unstructured task-specific language corpora such as game manuals as used in [Branavan et al, 2012].
  • Such research requires development of more challenging environments that reflect the semantics and diversity of the real world
Funding
  • This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement number 637713)
  • JL has been funded by EPSRC Doctoral Training Partnership and Oxford-DeepMind Scholarship, and GF has been funded by UK EPSRC CDT in Autonomous Intelligent Machines and Systems
Reference
  • [Agarwal et al., 2019] Rishabh Agarwal, Chen Liang, Dale Schuurmans, and Mohammad Norouzi. Learning to Generalize from Sparse and Underspecified Rewards. arXiv:1902.07198 [cs, stat], 2019.
    Findings
  • [Anderson et al., 2018] Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sünderhauf, Ian Reid, Stephen Gould, and Anton van den Hengel. Visionand-language navigation: Interpreting visually-grounded navigation instructions in real environments. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • [Andreas and Klein, 2015] Jacob Andreas and Dan Klein. Alignment-based compositional semantics for instruction following. ACL, 2015.
    Google ScholarLocate open access versionFindings
  • [Andreas et al., 2016] Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. Neural module networks. In CVPR, 2016.
    Google ScholarLocate open access versionFindings
  • [Andreas et al., 2017] Jacob Andreas, Dan Klein, and Sergey Levine. Modular Multitask Reinforcement Learning with Policy Sketches. In ICML, 2017.
    Google ScholarLocate open access versionFindings
  • [Andreas et al., 2018] Jacob Andreas, Dan Klein, and Sergey Levine. Learning with latent language. In NAACL-HLT, 2018.
    Google ScholarLocate open access versionFindings
  • [Antol et al., 2015] Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C Lawrence Zitnick, and Devi Parikh. VQA: Visual question answering. In ICCV, 2015.
    Google ScholarLocate open access versionFindings
  • [Artzi and Zettlemoyer, 2013] Yoav Artzi and Luke Zettlemoyer. Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions. In ACL, 2013.
    Google ScholarLocate open access versionFindings
  • [Arulkumaran et al., 2017] Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, and Anil Anthony Bharath. A Brief Survey of Deep Reinforcement Learning. IEEE Signal Proc. Magazine, 2017.
    Google ScholarLocate open access versionFindings
  • [Babes et al., 2011] Monica Babes, Vukosi Marivate, Kaushik Subramanian, and Michael L Littman. Apprenticeship learning about multiple intentions. In ICML, 2011.
    Google ScholarLocate open access versionFindings
  • [Bahdanau et al., 2019] Dzmitry Bahdanau, Felix Hill, Jan Leike, Edward Hughes, Arian Hosseini, Pushmeet Kohli, and Edward Grefenstette. Learning to Understand Goal Specifications by Modelling Reward. In ICLR, 2019.
    Google ScholarLocate open access versionFindings
  • [Banko et al., 2007] Michele Banko, Michael J Cafarella, Stephen Soderland, Matthew Broadhead, and Oren Etzioni. Open information extraction from the web. In IJCAI, 2007.
    Google ScholarLocate open access versionFindings
  • [Barto and Mahadevan, 2003] Andrew G Barto and Sridhar Mahadevan. Recent advances in hierarchical reinforcement learning. Discrete event dynamic systems, 13(1-2):41–77, 2003.
    Google ScholarLocate open access versionFindings
  • [Bisk et al., 2016] Yonatan Bisk, Deniz Yuret, and Daniel Marcu. Natural language communication with robots. In ACL, 2016.
    Google ScholarLocate open access versionFindings
  • [Bouziane et al., 2015] Abdelghani Bouziane, Djelloul Bouchiha, Noureddine Doumi, and Mimoun Malki. Question answering systems: survey and trends. Procedia Computer Science, 73:366–375, 2015.
    Google ScholarLocate open access versionFindings
  • [Branavan et al., 2010] S. R. K. Branavan, Luke S Zettlemoyer, and Regina Barzilay. Reading between the lines: Learning to map high-level instructions to commands. In ACL, 2010.
    Google ScholarLocate open access versionFindings
  • [Branavan et al., 2012] S. R. K. Branavan, David Silver, and Regina Barzilay. Learning to Win by Reading Manuals in a Monte-Carlo Framework. JAIR, 2012.
    Google ScholarLocate open access versionFindings
  • [Chaplot et al., 2018] Devendra Singh Chaplot, Kanthashree Mysore Sathyendra, Rama Kumar Pasumarthi, Dheeraj Rajagopal, and Ruslan Salakhutdinov. GatedAttention Architectures for Task-Oriented Language Grounding. In AAAI, 2018.
    Google ScholarLocate open access versionFindings
  • [Chen and Mooney, 2011] David L Chen and Raymond J Mooney. Learning to interpret natural language navigation instructions from observations. In AAAI, 2011.
    Google ScholarLocate open access versionFindings
  • [Chen et al., 2017a] Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. Reading wikipedia to answer opendomain questions. In ACL, 2017.
    Google ScholarLocate open access versionFindings
  • [Chen et al., 2017b] Hongshen Chen, Xiaorui Liu, Dawei Yin, and Jiliang Tang. A survey on dialogue systems: Recent advances and new frontiers. ACM SIGKDD Explorations Newsletter, 2017.
    Google ScholarLocate open access versionFindings
  • [Chen et al., 2018] Howard Chen, Alane Shur, Dipendra Misra, Noah Snavely, and Yoav Artzi. Touchdown: Natural language navigation and spatial reasoning in visual street environments. arXiv preprint arXiv:1811.12354, 2018.
    Findings
  • [Chevalier-Boisvert et al., 2019] Maxime
    Google ScholarFindings
  • Efficiency of Grounded Language Learning. In ICLR, 2019.
    Google ScholarLocate open access versionFindings
  • [Côté et al., 2018] Marc-Alexandre Côté, Ákos Kádár, Xingdi Yuan, Ben Kybartas, Tavian Barnes, Emery Fine, James Moore, Matthew Hausknecht, Layla El Asri, Mahmoud Adada, Wendy Tay, and Adam Trischler. TextWorld: A Learning Environment for Text-based Games. arXiv:1806.11532 [cs, stat], 2018.
    Findings
  • [Das et al., 2018a] Abhishek Das, Samyak Datta, Georgia Gkioxari, Stefan Lee, Devi Parikh, and Dhruv Batra. Embodied Question Answering. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • [Das et al., 2018b] Abhishek Das, Georgia Gkioxari, Stefan Lee, Devi Parikh, and Dhruv Batra. Neural Modular Control for Embodied Question Answering. CoRL, 2018.
    Google ScholarFindings
  • [Deerwester et al., 1990] Scott Deerwester, Susan T Dumais, George W Furnas, Thomas K Landauer, and Richard Harshman. Indexing by latent semantic analysis. Journal of the American society for information science, 1990.
    Google ScholarLocate open access versionFindings
  • [DePristo and Zubek, 2001] Mark A DePristo and Robert Zubek. being-in-the-world. In AAAI, 2001.
    Google ScholarLocate open access versionFindings
  • [Devlin et al., 2018] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs], 2018.
    Findings
  • [Dinan et al., 2018] Emily Dinan, Stephen Roller, Kurt Shuster, Angela Fan, Michael Auli, and Jason Weston. Wizard of wikipedia: Knowledge-powered conversational agents. CoRR, abs/1811.01241, 2018.
    Findings
  • [Eisenstein et al., 2009] Jacob Eisenstein, James Clarke, Dan Goldwasser, and Dan Roth. Reading to learn: constructing features from semantic abstracts. In ACL, 2009.
    Google ScholarLocate open access versionFindings
  • [Firth, 1957] John R Firth. A synopsis of linguistic theory, 1957.
    Google ScholarFindings
  • [Frome et al., 2013] Andrea Frome, Greg S Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Marc Aurelio Ranzato, and Tomas Mikolov. DeViSE: A Deep Visual-Semantic Embedding Model. In NIPS, 2013.
    Google ScholarFindings
  • [Fu et al., 2019] Justin Fu, Anoop Korattikara, Sergey Levine, and Sergio Guadarrama. From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following. In ICLR, 2019.
    Google ScholarLocate open access versionFindings
  • [Fulda et al., 2017] Nancy Fulda, Daniel Ricks, Ben Murdoch, and David Wingate. What can you do with a rock? affordance extraction via word embeddings. arXiv preprint arXiv:1703.03429, 2017.
    Findings
  • [Goldberg, 2019] Yoav Goldberg. Assessing BERT’s Syntactic Abilities. CoRR, abs/1901.05287, 2019.
    Findings
  • [Gopnik and Meltzoff, 1987] Alison Gopnik and Andrew Meltzoff. The development of categorization in the second year and its relation to other cognitive and linguistic developments. Child development, 1987.
    Google ScholarLocate open access versionFindings
  • [Gordon et al., 2018] Daniel Gordon, Aniruddha Kembhavi, Mohammad Rastegari, Joseph Redmon, Dieter Fox, and Ali Farhadi. Iqa: Visual question answering in interactive environments. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • [Goyal et al., 2019] Prasoon Goyal, Scott Niekum, and Raymond J. Mooney. Using Natural Language for Reward Shaping in Reinforcement Learning. IJCAI, 2019.
    Google ScholarLocate open access versionFindings
  • [Hermann et al., 2017] Karl Moritz Hermann, Felix Hill, Simon Green, Fumin Wang, Ryan Faulkner, Hubert Soyer, David Szepesvari, Wojciech Marian Czarnecki, Max Jaderberg, Denis Teplyashin, Marcus Wainwright, Chris Apps, Demis Hassabis, and Phil Blunsom. Grounded Language Learning in a Simulated 3d World. arXiv:1706.06551 [cs, stat], 2017.
    Findings
  • [Ho and Ermon, 2016] Jonathan Ho and Stefano Ermon. Generative Adversarial Imitation Learning. In NIPS, 2016.
    Google ScholarLocate open access versionFindings
  • [Howard and Ruder, 2018] Jeremy Howard and Sebastian Ruder. Universal language model fine-tuning for text classification. In ACL, 2018.
    Google ScholarLocate open access versionFindings
  • [Hu et al., 2019] Hengyuan Hu, Denis Yarats, Qucheng Gong, Yuandong Tian, and Mike Lewis. Hierarchical decision making by generating and following natural language instructions. arXiv preprint arXiv:1906.00744, 2019.
    Findings
  • [Infocom, 1980] Infocom. Zork I, 1980.
    Google ScholarFindings
  • [Janner et al., 2018] Michael Janner, Karthik Narasimhan, and Regina Barzilay. Representation learning for grounded spatial reasoning. TACL, 2018.
    Google ScholarLocate open access versionFindings
  • [Johnson et al., 2016] Matthew Johnson, Katja Hofmann, Tim Hutton, and David Bignell. The malmo platform for artificial intelligence experimentation. In IJCAI, pages 4246–4247, 2016.
    Google ScholarLocate open access versionFindings
  • [Johnson et al., 2017] Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C Lawrence Zitnick, and Ross Girshick. Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. In CVPR, 2017.
    Google ScholarLocate open access versionFindings
  • [Kollar et al., 2010] Thomas Kollar, Stefanie Tellex, Deb Roy, and Nicholas Roy. Toward understanding natural language directions. In HRI, 2010.
    Google ScholarLocate open access versionFindings
  • [Kostka et al., 2017] B. Kostka, J. Kwiecieli, J. Kowalski, and P. Rychlikowski. Text-based adventures of the golovin AI agent. In Conference on Computational Intelligence and Games (CIG), 2017.
    Google ScholarLocate open access versionFindings
  • [Kuhlmann et al., 2004] Gregory Kuhlmann, Peter Stone, Raymond Mooney, and Jude Shavlik. Guiding a Reinforcement Learner with Natural Language Advice: Initial Results in RoboCup Soccer. 2004.
    Google ScholarFindings
  • [MacGlashan et al., 2015] James MacGlashan, Monica Babes-Vroman, Marie desJardins, Michael L. Littman, Smaranda Muresan, Shawn Squire, Stefanie Tellex, Dilip Arumugam, and Lei Yang. Grounding english commands to reward functions. In Robotics: Science and Systems XI, 2015.
    Google ScholarLocate open access versionFindings
  • [MacMahon et al., 2006] Matt MacMahon, Brian Stankiewicz, and Benjamin Kuipers. Walk the talk: Connecting language, knowledge, and action in route instructions. In AAAI, 2006.
    Google ScholarLocate open access versionFindings
  • [Massiceti et al., 2018] Daniela Massiceti, N Siddharth, Puneet K Dokania, and Philip HS Torr. Flipdial: A generative model for two-way visual dialogue. In CVPR, 2018.
    Google ScholarLocate open access versionFindings
  • [Mei et al., 2016] Hongyuan Mei, Mohit Bansal, and Matthew R. Walter. Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences. AAAI, 2016.
    Google ScholarLocate open access versionFindings
  • [Mey, 1993] J. Mey. Pragmatics: An Introduction. Blackwell, 1993.
    Google ScholarFindings
  • [Mikolov et al., 2013] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality. NIPS, 2013.
    Google ScholarLocate open access versionFindings
  • [Misra et al., 2017] Dipendra Misra, John Langford, and Yoav Artzi. Mapping Instructions and Visual Observations to Actions with Reinforcement Learning. EMNLP, 2017.
    Google ScholarLocate open access versionFindings
  • [Narasimhan et al., 2015] Karthik Narasimhan, Tejas D. Kulkarni, and Regina Barzilay. Language understanding for text-based games using deep reinforcement learning. In EMNLP, 2015.
    Google ScholarLocate open access versionFindings
  • [Narasimhan et al., 2018] Karthik Narasimhan, Regina Barzilay, and Tommi Jaakkola. Grounding Language for Transfer in Deep Reinforcement Learning. JAIR, 2018.
    Google ScholarLocate open access versionFindings
  • [Oh et al., 2017] Junhyuk Oh, Satinder P. Singh, Honglak Lee, and Pushmeet Kohli. Zero-shot task generalization with multi-task deep reinforcement learning. In ICML, 2017.
    Google ScholarLocate open access versionFindings
  • [Osa et al., 2018] Takayuki Osa, Joni Pajarinen, Gerhard Neumann, J Andrew Bagnell, Pieter Abbeel, Jan Peters, et al. An algorithmic perspective on imitation learning. Foundations and Trends® in Robotics, 7(1-2):1–179, 2018.
    Google ScholarLocate open access versionFindings
  • [Peters et al., 2018a] Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. Deep contextualized word representations. In NAACL, 2018.
    Google ScholarLocate open access versionFindings
  • [Peters et al., 2018b] Matthew E. Peters, Mark Neumann, Luke Zettlemoyer, and Wen-tau Yih. Dissecting contextual word embeddings: Architecture and representation. In EMNLP, 2018.
    Google ScholarLocate open access versionFindings
  • [Radford et al., 2019] Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. 2019.
    Google ScholarFindings
  • [Shu et al., 2018] Tianmin Shu, Caiming Xiong, and Richard Socher. Hierarchical and Interpretable Skill Acquisition in Multi-task Reinforcement Learning. ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • [Shusterman et al., 2011] Anna Shusterman, Sang Ah Lee, and Elizabeth Spelke. Cognitive effects of language on human navigation. Cognition, 2011.
    Google ScholarLocate open access versionFindings
  • [Silver et al., 2017] David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, and Demis Hassabis. Mastering the game of Go without human knowledge. Nature, 2017.
    Google ScholarLocate open access versionFindings
  • [Singh et al., 2002] Satinder Singh, Diane Litman, Michael Kearns, and Marilyn Walker. Optimizing dialogue management with reinforcement learning: Experiments with the njfun system. JAIR, 2002.
    Google ScholarLocate open access versionFindings
  • [Socher et al., 2013] Richard Socher, Milind Ganjoo, Christopher D. Manning, and Andrew Y. Ng. Zero-shot Learning Through Cross-modal Transfer. In NIPS, 2013.
    Google ScholarLocate open access versionFindings
  • [Spelke and Kinzler, 2007] Elizabeth Spelke and Katherine D Kinzler. Core knowledge. Developmental science, 2007.
    Google ScholarLocate open access versionFindings
  • [Sutton and Barto, 2018] Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018.
    Google ScholarFindings
  • [Tellex et al., 2011] Stefanie Tellex, Thomas Kollar, Steven Dickerson, Matthew R Walter, Ashis Gopal Banerjee, Seth Teller, and Nicholas Roy. Understanding natural language commands for robotic navigation and mobile manipulation. In AAAI, 2011.
    Google ScholarLocate open access versionFindings
  • [Tenney et al., 2019] Ian Tenney, Patrick Xia, Berlin Chen, Alex Wang, Adam Poliak, R Thomas McCoy, Najoung Kim, Benjamin Van Durme, Sam Bowman, Dipanjan Das, and Ellie Pavlick. What do you learn from context? probing for sentence structure in contextualized word representations. In ICLR, 2019.
    Google ScholarLocate open access versionFindings
  • [Tesauro, 1995] Gerald Tesauro. Temporal Difference Learning and TD-Gammon. Communications of the ACM, 1995.
    Google ScholarLocate open access versionFindings
  • [Tian et al., 2017] Yuandong Tian, Qucheng Gong, Wenling Shang, Yuxin Wu, and C. Lawrence Zitnick. ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games. NIPS, 2017.
    Google ScholarLocate open access versionFindings
  • [Torrado et al., 2018] Ruben Rodriguez Torrado, Philip Bontrager, Julian Togelius, Jialin Liu, and Diego Perez-Liebana. Deep reinforcement learning for general video game ai. In CIG. IEEE, 2018.
    Google ScholarFindings
  • [Tsividis et al., 2017] Pedro Tsividis, Thomas Pouncy, Jaqueline L. Xu, Joshua B. Tenenbaum, and Samuel J. Gershman. Human learning in atari. In AAAI, 2017.
    Google ScholarLocate open access versionFindings
  • [Wang et al., 2016] Sida I Wang, Percy Liang, and Christopher D Manning. Learning Language Games through Interaction. In ACL, 2016.
    Google ScholarLocate open access versionFindings
  • [Wang et al., 2019] Xin Wang, Qiuyuan Huang, Asli Çelikyilmaz, Jianfeng Gao, Dinghan Shen, Yuan-Fang Wang, William Yang Wang, and Lei Zhang. Reinforced crossmodal matching and self-supervised imitation learning for vision-language navigation. In CVPR, 2019.
    Google ScholarLocate open access versionFindings
  • [White and Sofge, 1992] David Ashley White and Donald A Sofge. Handbook of Intelligent Control: Neural, Fuzzy, and Adaptative Approaches. Van Nostrand Reinhold Company, 1992.
    Google ScholarLocate open access versionFindings
  • [Yan et al., 2018] Claudia Yan, Dipendra Misra, Andrew Bennnett, Aaron Walsman, Yonatan Bisk, and Yoav Artzi. Chalet: Cornell house agent learning environment. arXiv preprint arXiv:1801.07357, 2018.
    Findings
  • [Yu et al., 2018] Haonan Yu, Haichao Zhang, and Wei Xu. Interactive Grounded Language Acquisition and Generalization in a 2d World. ICLR, 2018.
    Google ScholarLocate open access versionFindings
  • [Yuan et al., 2018] Xingdi Yuan, Marc-Alexandre Côté, Alessandro Sordoni, Romain Laroche, Remi Tachet des Combes, Matthew Hausknecht, and Adam Trischler. Counting to Explore and Generalize in Text-based Games. arXiv:1806.11525 [cs], 2018.
    Findings
  • [Zellers et al., 2018] Rowan Zellers, Yonatan Bisk, Roy Schwartz, and Yejin Choi. SWAG: A large-scale adversarial dataset for grounded commonsense inference. In EMNLP, 2018.
    Google ScholarLocate open access versionFindings
  • [Ziebart et al., 2008] Brian D Ziebart, Andrew Maas, J Andrew Bagnell, and Anind K Dey. Maximum Entropy Inverse Reinforcement Learning. In AAAI, 2008.
    Google ScholarLocate open access versionFindings
  • [Zipf, 1949] George Kingsley Zipf. Human behavior and the principle of least effort. 1949.
    Google ScholarFindings
Your rating :
0

 

Tags
Comments