Reinforcement learning is an area of machine learning concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. It allows machines and software agents to automatically determine the ideal behavior within a specific context, in order to maximize its performance. Simple reward feedback is required for the agent to learn its behavior; this is known as the reinforcement signal.
Reinforcement learning is defined by a specific type of problem, and all its solutions are classed as Reinforcement learning algorithms. In the problem, an agent is supposed decide the best action to select based on his current state. The environment is typically formulated as a Markov decision process (MDP), as many reinforcement learning algorithms for this context use dynamic programming techniques. The main difference between the classical techniques and reinforcement learning algorithms is that the latter do not need knowledge about the MDP and they target large MDPs where exact methods become impractical. The problem has been studied in the theory of optimal control, still most studies are concerned with the existence of optimal solutions and their characterization, and not with the learning or approximation aspects. In economics and game theory, reinforcement learning may be utilized to analyze how equilibrium may arise under bounded rationality.
The simplest context in which to think about reinforcement learning is in games with a clear objective and a point system.


Diagram from Berkeley's CS 294: Deep Reinforcement Learning by John Schulman & Pieter Abbeel
For example, a game where a mouse is looking for the cheese at the end of the maze (+500 points), or the lesser reward of water along the way (+10 points). Meanwhile, mouse tries to avoid electric shock (-100 points).
The reward is not always immediate. Here, the robot-mouse will go to a long stretch of the maze. It has to walk through the paths and face several decision points before reaching the cheese.
The agent observes the environment, takes an action to interact with the environment, and receives positive or negative reward.
With the advance of neural networks, deep reinforcement learning, a strategy that uses neural networks to evaluate the states (e.g. Q-values), becomes more popular. It allows researchers and engineers to create agents that does well in more complex enviroments.
Due to its generality, it is studied in many other disciplines, such as game theory, control theory, operations research, information theory, simulation-based optimization, multi-agent systems, swarm intelligence, statistics and genetic algorithms. In the operations research and control literature.
- Computer Games
- Industrial and Financial systems
- Vehicle control
- Robotics applications
- Traffic Systems
- Communication networks and Cognitive Radio
- Power systems
- Computer vision
Timeline
People
Further reading
A Generalized Reinforcement-Learning Model: Convergence and Application
Michael L. Littman and Csaba Szepesva`ri
Ant-Q: A Reinforcement Learning approach tothe traveling salesman problem
Luca M. Gambardella and Marco Dorigo
Academic paper
Deep Reinforcement Learning in Action
Alexander Zai and Brandon Brown
Web
Grokking Deep Reinforcement Learning
Miguel Morales
Web
Machine Learning for Humans, Part 5: Reinforcement Learning Exploration and exploitation. Markov decision processes. Q-learning, policy learning, and deep reinforcement learning.
Vishal Maini
Playing Atari with Deep Reinforcement Learning
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller
Reinforcement learning algorithms with function approximation: Recent advances and applications
Xin Xu, Lei Zuo and Zhenhua Huang
Academic paper
Reinforcement Learning in Motion
Phil Tabor
Web
Reinforcement Learning: An Introduction
Richard S. Sutton and Andrew G. Barto
2014
Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning
Shakir Mohamed and Danilo Jimenez Rezende
Academic paper
Documentaries, videos and podcasts
Companies
DeepMind
Demis Hassabis
London
AI research and application