Reinforcement learning is an area of machine learning concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. It allows machines and software agents to automatically determine the ideal behavior within a specific context, in order to maximize its performance. Simple reward feedback is required for the agent to learn its behavior; this is known as the reinforcement signal.

Reinforcement learning is defined by a specific type of problem, and all its solutions are classed as Reinforcement learning algorithms. In the problem, an agent is supposed decide the best action to select based on his current state. The environment is typically formulated as a Markov decision process (MDP), as many reinforcement learning algorithms for this context use dynamic programming techniques. The main difference between the classical techniques and reinforcement learning algorithms is that the latter do not need knowledge about the MDP and they target large MDPs where exact methods become impractical. The problem has been studied in the theory of optimal control, still most studies are concerned with the existence of optimal solutions and their characterization, and not with the learning or approximation aspects. In economics and game theory, reinforcement learning may be utilized to analyze how equilibrium may arise under bounded rationality.

The simplest context in which to think about reinforcement learning is in games with a clear objective and a point system.

For example, a game where a mouse is looking for the cheese at the end of the maze (+500 points), or the lesser reward of water along the way (+10 points). Meanwhile, mouse tries to avoid electric shock (-100 points).

The reward is not always immediate. Here, the robot-mouse will go to a long stretch of the maze. It has to walk through the paths and face several decision points before reaching the cheese.

The agent observes the environment, takes an action to interact with the environment, and receives positive or negative reward.

With the advance of neural networks, deep reinforcement learning, a strategy that uses neural networks to evaluate the states (e.g. Q-values), becomes more popular. It allows researchers and engineers to create agents that does well in more complex enviroments.

Due to its generality, it is studied in many other disciplines, such as game theory, control theory, operations research, information theory, simulation-based optimization, multi-agent systems, swarm intelligence, statistics and genetic algorithms. In the operations research and control literature.

- Computer Games

- Industrial and Financial systems
- Vehicle control
- Robotics applications
- Traffic Systems
- Communication networks and Cognitive Radio
- Power systems
- Computer vision

## Timeline

## People

## Further reading

A Generalized Reinforcement-Learning Model: Convergence and Application

Michael L. Littman and Csaba Szepesva`ri

Ant-Q: A Reinforcement Learning approach tothe traveling salesman problem

Luca M. Gambardella and Marco Dorigo

Academic paper

Deep Reinforcement Learning in Action

Alexander Zai and Brandon Brown

Web

Grokking Deep Reinforcement Learning

Miguel Morales

Web

Machine Learning for Humans, Part 5: Reinforcement Learning Exploration and exploitation. Markov decision processes. Q-learning, policy learning, and deep reinforcement learning.

Vishal Maini

Playing Atari with Deep Reinforcement Learning

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller

Reinforcement learning algorithms with function approximation: Recent advances and applications

Xin Xu, Lei Zuo and Zhenhua Huang

Academic paper

Reinforcement Learning in Motion

Phil Tabor

Web

Reinforcement Learning: An Introduction

Richard S. Sutton and Andrew G. Barto

2014

Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning

Shakir Mohamed and Danilo Jimenez Rezende

Academic paper

## Documentaries, videos and podcasts

## Companies

DeepMind

Demis Hassabis

London

AI research and application