Reinforcement LearningReinforcement Learning is an area of machine learning inspired by behaviourist psychology, concerned with how software agents ought to take ''actions'' in an ''environment'' so as to maximize some notion of cumulative ''reward''. The problem is studied in many other disciplines, such as game theory, control theory, operations research, information theory, and simulation-based optimization. In the operations research and control literature, reinforcement learning is called ''approximate dynamic programming,'' The approach has been studied in the theory of optimal control, though most studies are concerned with the existence of optimal solutions and their characterization, and not with learning or approximation. In economics and game theory, reinforcement learning may be used to explain how equilibrium may arise under bounded rationality. In machine learning, the environment is typically formulated as a Reinforcement learning (MDP), as many reinforcement learning algorithms for this context utilize dynamic programming techniques. The main difference between the classical techniques and reinforcement learning algorithms is that the latter do not need knowledge about the MDP and they target large MDPs where exact methods become infeasible. Reinforcement learning differs from standard supervised learning in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected. Instead the focus is on performance,, which involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge). The exploration vs. exploitation trade-off has been most thoroughly studied through the multi-armed bandit problem and infinite MDPs.
Bowen Baker, Ingmar Kanitscheider, Todor Markov,Yi Wu, Glenn Powell, Bob McGrew,Igor Mordatch
ICLR, (2020)
Through multi-agent competition, the simple objective of hide-and-seek, and standard reinforcement learning algorithms at scale, we find that agents create a self-supervised autocurriculum inducing multiple distinct rounds of emergent strategy, many of which require sophisticated...
Cited by79BibtexViews341Links
1
0
Kate Rakelly, Aurick Zhou, Deirdre Quillen,Chelsea Finn,Sergey Levine
arXiv: Learning, (2019)
Deep reinforcement learning algorithms require large amounts of experience to learn an individual task. While in principle meta-reinforcement learning (meta-RL) algorithms enable agents to learn new skills from small amounts of experience, several major challenges preclude their ...
Cited by69BibtexViews178Links
2
0
Russell Mendonca,Abhishek Gupta, Rosen Kralev,Pieter Abbeel,Sergey Levine,Chelsea Finn
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), (2019): 9653-9664
We believe that our method addresses a major limitation in meta-reinforcement learning: meta-reinforcement learning algorithms can effectively acquire adaptation procedures that can learn new tasks at meta-test time with just a few samples, they are extremely expensive in terms o...
Cited by17BibtexViews216Links
1
0
international conference on machine learning, (2019)
We take a step further by studying the learned distributions by Quantile Regression-DQN, and discovered the composite effect of intrinsic and parametric uncertainties is challenging for efficient exploration
Cited by6BibtexViews79Links
1
0
Kamil Ciosek, Quan Vuong, Robert Loftin,Katja Hofmann
NeurIPS, pp.1785-1796, (2019)
Cited by5BibtexViews73
0
0
Yonathan Efroni, Gal Dalal,Bruno Scherrer,Shie Mannor
national conference on artificial intelligence, (2019)
We show that even when partial policy evaluation is performed and noise is added to it, along with a noisy policy improvement stage, the above Policy Iteration scheme converges with a γh contraction coefficient
Cited by5BibtexViews120Links
0
0
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), (2019): 14111-14121
We believe a possible reason could be that since such low values are very different than the original Deep Q-Networks settings, some of the other Deep Q-Networks hyper-parameters might no longer be ideal in the low discount factor region
Cited by4BibtexViews76Links
1
0
national conference on artificial intelligence, (2018)
We have demonstrated that several improvements to Deep Q-Networks algorithm can be successfully integrated into a single learning algorithm that achieves state-of-the-art performance
Cited by563BibtexViews310Links
0
0
international conference on learning representations, (2018)
We introduce NoisyNet, a deep reinforcement learning agent with parametric noise added to its weights, and show that the induced stochasticity of the agent’s policy can be used to aid efficient exploration. The parameters of the noise are learned with gradient descent along with ...
Cited by339BibtexViews273Links
0
0
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), (2018): 4754-4765
Our results show that model-based reinforcement learning with neural network dynamics models can achieve results that are competitive with Bayesian nonparametric models such as Gaussian processes, and on par with model-free algorithms such as policy optimization and Soft actor cr...
Cited by261BibtexViews126Links
0
0
Thanard Kurutach, Ignasi Clavera,Yan Duan,Aviv Tamar,Pieter Abbeel
ICLR, (2018)
We further evaluate the effect of each key component of our algorithm, showing that both using Trust Region Policy Optimization and model ensemble are essential for successful applications of deep model-based reinforcement learning
Cited by121BibtexViews86Links
0
0
arXiv: Learning, (2018)
We introduce the model-based value expansion method, an algorithm for incorporating predictive models of system dynamics into model-free value function estimation
Cited by95BibtexViews119Links
0
0
Jacob Buckman, Danijar Hafner,George Tucker,Eugene Brevdo,Honglak Lee
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), (2018): 8224-8234
We demonstrated that stochastic ensemble value expansion, an uncertainty-aware approach for merging model-free and model-based reinforcement learning, outperforms model-free approaches while reducing sample complexity by an order magnitude on several challenging tasks
Cited by67BibtexViews130Links
0
0
international conference on machine learning, (2018)
We introduced policy certificates to improve accountability in reinforcement learning by enabling users to intervene if the guaranteed performance is deemed inadequate
Cited by47BibtexViews101Links
0
0
arXiv: Learning, (2018)
It is clear that influence is essential to achieve any form of learning, attesting to the promise of this idea and highlighting the complexity of learning general deep neural network multi-agent policies
Cited by39BibtexViews107Links
0
0
Barret Zoph,Quoc V. Le
international conference on learning representations, (2017)
Neural networks are powerful and flexible models that work well for many difficult learning tasks in image, speech and natural language understanding. Despite their success, neural networks are still hard to design. In this paper, we use a recurrent network to generate the model ...
Cited by1726BibtexViews221Links
0
0
arXiv: Machine Learning, (2017)
We have explored Evolution Strategies, a class of black-box optimization algorithms, as an alternative to popular Markov Decision Process-based reinforcement learning techniques such as Q-learning and policy gradients
Cited by676BibtexViews194Links
0
0
ICML, (2017): 449-458
We found that learning value distributions is a powerful notion that allows us to surpass most gains previously made on Atari 2600, without further algorithmic adjustments
Cited by461BibtexViews83Links
0
0
Kai Arulkumaran,Marc Peter Deisenroth, Miles Brundage, Anil Anthony Bharath
IEEE Signal Processing Magazine, no. 6 (2017): 26-38
Deep reinforcement learning (DRL) is poised to revolutionize the field of artificial intelligence (AI) and represents a step toward building autonomous systems with a higher-level understanding of the visual world. Currently, deep learning is enabling reinforcement learning (RL) ...
Cited by310BibtexViews903Links
0
0
Keywords
Reinforcement LearningMachine LearningArtificial IntelligenceDopamineArtificial IntelligentComputer ScienceFunction ApproximationGame TheoryMean Square ErrorValue Function
Authors
David Silver
Paper 14
Pieter Abbeel
Paper 11
Sergey Levine
Paper 10
Koray Kavukcuoglu
Paper 6
Satinder Singh
Paper 6
Marc G. Bellemare
Paper 5
Daan Wierstra
Paper 5
Volodymyr Mnih
Paper 5
Nicolas Heess
Paper 5