Log in
Enquire now
OpenAI Five

OpenAI Five

OpenAI Five is a team of five neural networks, developed by the AI research company Open AI, which competed publicly in a number of Dota 2 exhibition events from 2018 to 2019.

OverviewStructured DataIssuesContributors

Contents

openai.com/research/openai-five
openai.com/five
Is a
Product
Product

Product attributes

Industry
Artificial Intelligence (AI)
Artificial Intelligence (AI)
eSports
eSports
Video game industry
Video game industry
Launch Date
March 2017
Product Parent Company
OpenAI
OpenAI

Other attributes

Wikidata ID
Q60748043
Overview

OpenAI Five is a team of five neural networks, developed by the AI research company Open AI, which competed publicly in a number of Dota 2 matches in 2018 and 2019. In April 2019, OpenAI Five became the first AI to beat reigning world champions in an eSports game, winning back-to-back games against OG, winners of The International (Dota 2's premiere annual tournament) in 2018. OpenAI's first AI playing Dota competed in 1v1 matches in 2017, beating many top professionals. The first version of OpenAI Five began competing in 5v5 Dota in 2018, beating a team of 99.95th percentile players, before losing games against top Dota 2 players at The International 2018. After defeating OG in 2019, OpenAI retired OpenAI Five as a competitor with the technology incorporated into the company's future work.

OpenAI Five (June 25, 2018)

Dota 2 offers novel challenges for AI systems, including long time horizons, imperfect information, and complex, continuous state-action spaces. These challenges are important when building capable AI systems. OpenAI began developing OpenAI Five to work on deep reinforcement learning (RL), the concept of training a deep neural network to achieve specific goals using reward and punishment algorithms. OpenAI wanted to apply deep RL to a problem that was unsolvable at the time in order to improve the capabilities of its systems. While the company planned to implement sophisticated algorithmic ideas, such as hierarchical reinforcement learning, the key improvement needed was scale. OpenAI created a system called Rapid to build OpenAI Five, allowing them to run Proximal Policy Optimization at a previously unprecedented scale. Rapid was also used outside of games and simulated environments to control a robotic hand.

OpenAI Five leverages existing reinforcement learning techniques, scaled to learn from batches of approximately 2 million frames every 2 seconds. The final version of OpenAI (that beat the reigning Dota 2 champions) was trained for ten months using a distributed system for continual training, starting in June 2018. Training included adapting to changes in the model size as well as changes to the game, with some significant game patch updates being released with new features.

Dota 2

Dota 2 is played on a square map with two teams that have to defend their bases situated in opposite corners. The game ends when a team destroys a structure within the opposing team's base called an "ancient." Teams have five players, each controlling a hero unit with unique abilities. During the game, teams also have a constant stream of small creep units uncontrolled by the players who walk toward the enemy base attacking any opponents (units or buildings) they find. Players gather resources from creeps that can increase the power of their hero unit.

OpenAI Five playing Dota 2

Challenges for AI systems playing Dota 2 include the following:

  • Long time horizons—Dota 2 games run at thirty frames per second and typically last approximately 45 minutes. OpenAI Five selects an action every fourth frame, yielding approximately 20,000 steps per game. By comparison, chess usually lasts 80 moves, and Go 150 moves.
  • Partially-observed state—Each team only sees the portion of the map near their units and buildings. Therefore, playing Dota 2 requires making inferences based on incomplete data and modeling the opponent’s behavior.
  • High-dimensional action and observation spaces—Dota 2 is played on a large map that contains ten user-controlled heroes, dozens of buildings, dozens of non-player units, and a long tail of game features. OpenAI Five observes approximately 16,000 total values (mostly floats and categorical values with hundreds of possibilities) each time step. The model discretizes the action space in an average timestep it chooses between 8,000 to 80,000 actions (depending on the hero). For comparison, Chess requires around one thousand values per observation (mostly six-possibility categorical values) and Go around six thousand values (all binary). Chess has a branching factor of around 35 valid actions, and Go around 250.

OpenAI Five plays Dota 2 with two limitations from the regular game:

  1. Players can select from a pool of 117 heroes to play in a normal game. OpenAI only supports a subset of seventeen heroes.
  2. Removal of items that allow a player to temporarily control multiple units at the same time (e.g., Illusion Rune, Helm of the Dominator, Manta Style, and Necronomicon). This feature added significant technical complexity, with the ability for a neural network to control multiple units.

For each timestep (fourth frame running at thirty frames per second), OpenAI Five receives an observation from the game engine, encoding all the information a human player would see. OpenAI Five then returns a discrete action to the game engine, encoding its desired movement, action, etc. Some game mechanics were controlled by hand-scripted logic, not policy, and these included the order in which heroes purchase items and abilities, controlling the unique courier unit, and selecting items heroes keep in reserve. During training, some properties of the environment were randomized, including the heroes in the game and the items the heroes purchased. This helped improve the diversity of data in the training games to ensure the models could respond to a wide variety of strategies and situations.

The policy selecting an action is defined as a function from the history of observations to a probability distribution over actions, which OpenAI parameterizes as a recurrent neural network with approximately 159 million parameters. The neural network consists primarily of a single-layer 4096-unit long short-term memory (LSTM). For a given policy, the models play games by repeatedly passing the current observation as input and sampling an action from the output distribution at each timestep.

History
1v1 Dota 2 bot

OpenAI first began developing AI systems to compete in Dota 2 with a bot for 1v1 matches under standard tournament rules. The bot learned the game from self-play, not using imitation learning or tree search. In August 2017, OpenAI's bot played many top professional players, including SumaiL, the top 1v1 player in the world at the time, and Arteezy, the top overall player in the world at the time. On August 11, 2017, OpenAI's Dota bot played Dendi on the mainstage at The International, winning a best-of-three match. This success required the bot to develop intuitions about its opponents to react accordingly.

Learned Bot Behaviors (Feb 15, 2018)

Results from the 1v1 bot showed that with enough compute, self-play can generate a machine-learning system that progresses from below human level to beating the best players in the world. While supervised deep learning requires high-quality data for training, self-play systems generate automatically improving data as they progress. OpenAI estimated the compute required to train its 1v1 model to be 8 petaflop/s-days.

OpenAI published a timeline of its bot's progression in 2017:

  • March 1st—has its first classical reinforcement learning results in a simple Dota environment.
  • Early June—beats 1.5k matchmaking ranking (MMR) tester
  • June 30th—wins the majority of games against 3k MMR tester
  • July 8th—gets first win against 7.5k MMR semi-pro tester
  • August 7th—beats Blitz (6.2k former pro) 3–0, Pajkatt (8.5k pro) 2–1, and CC&C (8.9k pro) 3–0
  • August 9th—beats Arteezy (10k pro, top player) 10–0
  • August 10th—beats Sumail (8.3k pro, top 1v1 player) 6–0
  • August 11th: beat Dendi (7.3k pro, former world champion) 2–0. The bot has a 60% win rate versus the August 10th version.
OpenAI Five
2018

OpenAI Five began defeating amateur human teams at the more complicated 5v5 Dota 2 in June 2018. Matches were played under a number of restrictions, including those below:

  • Mirror match of Necrophos, Sniper, Viper, Crystal Maiden, and Lich
  • No warding
  • No Roshan
  • No invisibility (consumables and relevant items)
  • No summons/illusions
  • No Divine Rapier, Bottle, Quelling Blade, Boots of Travel, Tome of Knowledge, Infused Raindrop
  • Five invulnerable couriers, no exploiting them by scouting or tanking
  • No Scan

To train OpenAI Five plays 180 years worth of games against itself every day, learning via self-play. It uses a scaled-up version of PPO running on 256 GPUs and 128,000 CPU cores, a much larger-scale version of the system built for the solo variant of the game previously.

Comparison of the OpenAI 1v1 bot and OpenAI Five

OpenAI 1v1 bot
OpenAI Five

Batch size

8,388,608 observations

1,048,576 observations

Batches per minute

~20

~60

CPUs

60,000 CPU cores on Azure

128,000 preemptible CPU cores on GCP

Experience collected

~300 years per day

~180 years per day (~900 years per day counting each hero separately)

GPUs

256 K80 GPUs on Azure

256 P100 GPUs on GCP

On August 5, 2018, OpenAI Five won a best-of-three match against a team of 99.95th percentile Dota players Blitz, Cap, Fogged, Merlini, and MoonMeander, four of which have played Dota professionally. The match was played in front of a live audience, with 100,000 concurrent livestream viewers. After going 2-0 down, the human won game three with the audience adversarially selecting OpenAI Five’s heroes.

During the match, OpenAI revealed its ability draft (select the heroes to play). An important part of the game, drafting is complicated by the way heroes interact with each other. In June 2018, OpenAI added a win probability output to its neural network. This feature could be used to evaluate the win probability of any draft. After the game 1 draft, OpenAI Five predicted a 95 percent win probability. It won the first game in 21 minutes and 37 seconds. After the game 2 draft, OpenAI Five predicted a 76.2 percent win probability and won the second in 24 minutes and 53 seconds. OpenAI estimated the August 5th model required 35 petaflop/s-days more than triple the June version of the model that first began beating amateur teams (11 petaflop/s-days).

OpenAI Five would go on to compete at The International 2018 in Vancouver, losing two show matches against top Dota 2 players. OpenAI Five lost to paiN Gaming on August 22 and Big God on August 23. In comparison to the benchmark games earlier in August, the matches at The International were played against significantly better players, using hero lineups provided by a third party and without a key restriction giving each hero its own invulnerable courier rather than having a single mortal team courier. The games remained close, with OpenAI five having a good chance of winning for the first twenty to thirty-five of both games.

2019

On April 13, 2019, the OpenAI Five Finals were held in the Bay Area, with the AI competing against the reigning Dota 2 world champions (winners of The International 2018) OG. The matches were played in front of a live audience and live-streamed on Twitch. OpenAI Five defeated OG, winning back-to-back games, becoming the first AI to beat world champions in an eSports game. Following its losses at The International 2018, OpenAI Five was upgraded with 8x more training compute. In total, the April 13th version of OpenAI Five consumed 800 petaflop/s-days and experienced about 45,000 years of Dota self-play over ten real-time months (up from about 10,000 years over 1.5 real-time months as of The International 2018), for an average of 250 years of simulated experience per day. The Finals version of OpenAI Five has a 99.9% win rate versus The International 2018 version.

The OpenAI Five Finals was the final time the AI competed, with OpenAI retiring the models after defeating the world champions. The technology developed was incorporated into future OpenAI work.

Timeline

No Timeline data yet.

Further Resources

Title
Author
Link
Type
Date

Dota 2 with Large Scale Deep Reinforcement Learning

OpenAI, :, Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław Dębiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, Rafal Józefowicz, Scott Gray, Catherine Olsson, Jakub Pachocki, Michael Petrov, Henrique P. d. O. Pinto, Jonathan Raiman, Tim Salimans, Jeremy Schlatter, Jonas Schneider, Szymon Sidor, Ilya Sutskever, Jie Tang, Filip Wolski, Susan Zhang

https://arxiv.org/abs/1912.06680

December 13, 2019

References

Find more entities like OpenAI Five

Use the Golden Query Tool to find similar entities by any field in the Knowledge Graph, including industry, location, and more.
Open Query Tool
Access by API
Golden Query Tool
Golden logo

Company

  • Home
  • Press & Media
  • Blog
  • Careers
  • WE'RE HIRING

Products

  • Knowledge Graph
  • Query Tool
  • Data Requests
  • Knowledge Storage
  • API
  • Pricing
  • Enterprise
  • ChatGPT Plugin

Legal

  • Terms of Service
  • Enterprise Terms of Service
  • Privacy Policy

Help

  • Help center
  • API Documentation
  • Contact Us
By using this site, you agree to our Terms of Service.