Golden Recursion Inc. logoGolden Recursion Inc. logo
Advanced Search
Reinforcement Learning

Reinforcement Learning

An area of machine learning focusing on how machines and software agents react in a specific context to maximize performance and achieve reward known as reinforcement signal.

Reinforcement learning is an area of machine learning concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. It allows machines and software agents to automatically determine the ideal behavior within a specific context, in order to maximize its performance. Simple reward feedback is required for the agent to learn its behavior; this is known as the reinforcement signal.

Reinforcement learning is defined by a specific type of problem, and all its solutions are classed as Reinforcement learning algorithms. In the problem, an agent is supposed decide the best action to select based on his current state. The environment is typically formulated as a Markov decision process (MDP), as many reinforcement learning algorithms for this context use dynamic programming techniques. The main difference between the classical techniques and reinforcement learning algorithms is that the latter do not need knowledge about the MDP and they target large MDPs where exact methods become impractical. The problem has been studied in the theory of optimal control, still most studies are concerned with the existence of optimal solutions and their characterization, and not with the learning or approximation aspects. In economics and game theory, reinforcement learning may be utilized to analyze how equilibrium may arise under bounded rationality.

The simplest context in which to think about reinforcement learning is in games with a clear objective and a point system.

Diagram from Berkeley's CS 294: Deep Reinforcement Learning by John Schulman & Pieter Abbeel

For example, a game where a mouse is looking for the cheese at the end of the maze (+500 points), or the lesser reward of water along the way (+10 points). Meanwhile, mouse tries to avoid electric shock (-100 points).

The reward is not always immediate. Here, the robot-mouse will go to a long stretch of the maze. It has to walk through the paths and face several decision points before reaching the cheese.

The agent observes the environment, takes an action to interact with the environment, and receives positive or negative reward.

With the advance of neural networks, deep reinforcement learning, a strategy that uses neural networks to evaluate the states (e.g. Q-values), becomes more popular. It allows researchers and engineers to create agents that does well in more complex enviroments.

Due to its generality, it is studied in many other disciplines, such as game theory, control theory, operations research, information theory, simulation-based optimization, multi-agent systems, swarm intelligence, statistics and genetic algorithms. In the operations research and control literature.

Practical Applications of Reinforcement Learning
  • Computer Games


Further Resources


A Generalized Reinforcement-Learning Model: Convergence and Application

Michael L. Littman and Csaba Szepesva`ri

Ant-Q: A Reinforcement Learning approach tothe traveling salesman problem

Luca M. Gambardella and Marco Dorigo

Academic paper

Deep Reinforcement Learning in Action

Alexander Zai and Brandon Brown


Grokking Deep Reinforcement Learning

Miguel Morales


Machine Learning for Humans, Part 5: Reinforcement Learning Exploration and exploitation. Markov decision processes. Q-learning, policy learning, and deep reinforcement learning.

Vishal Maini


Golden logo
By using this site, you agree to our Terms & Conditions.