NVIDIA H100 Tensor Core GPU

nvidia.com/en-us/data-center/h100/

Is a

Product

Product attributes

Launch Date

September 20, 2022

Industry

Generative AI

Graphics processing unit

Artificial Intelligence (AI)

GPU computing

Product Parent Company

NVIDIA

Competitors

Microsoft Azure Maia 100 AI Accelerator

AMD Instinct MI300X

Other attributes

Date Announced

March 22, 2022

Named After

Grace Hopper

Overview

The Nvidia H100 Tensor Core GPU is a graphics processing unit developed by Nvidia that implements the Hopper architecture. H100 is Nvidia's 9th-generation data center GPU. It is designed to provide significantly better performance for large-scale AI and high-performance computing (HPC) workloads than the company's previous generation A100 Tensor Core GPU. H100 is implemented using Taiwan Semiconductor Manufacturing Company's 4N process customized for Nvidia with 80 billion transistors and multiple architectural advances. Nvidia states for mainstream AI and HPC models, H100 with InfiniBand interconnect delivers up to 30 times the performance of A100. The NVIDIA NVLink Switch System allows up to 256 H100 GPUs to be connected to accelerate exascale workloads. The H100 also includes a dedicated Transformer Engine to solve trillion-parameter language models.

The Nvidia H100 Tensor Core GPU is used by the following leading AI companies:

OpenAI used NVIDIA A100 GPUs to train and run ChatGPT and will use H100 on its Azure supercomputer to power its continuing AI research.

Meta, a technology partner of NVIDIA, developed a Hopper-based AI supercomputer called Grand Teton.
Stability AI is an H100 early-access customer on Amazon Web Services, using H100 to accelerate its video, 3D, and multimodal models.

Features

Streaming multiprocessor (SM)

Fourth-generation tensor cores are up to six times faster than the A100. On a per SM basis, the tensor cores deliver double the matrix multiply-accumulate (MMA) computational rates than the A100 SM on equivalent data types.
Dynamic programming X (DPX) instructions accelerate dynamic programming algorithms by up to seven times compared to the A100 GPU.
IEEE FP64 and FP32 have triple the speed for processing rates compared to A100.
The thread block cluster feature allows programmatic control of locality at a granularity larger than a single thread block on a single SM.
Asynchronous execution features include a new tensor memory accelerator (TMA) unit that transfers large blocks of data efficiently between global and shared memory.

Transformer engine

The H100's new transformer engine uses a combination of software and custom Hopper tensor core technology to accelerate transformer model training and inference. The transformer engine can dynamically choose between FP8 and 16-bit calculations, automatically re-casting and scaling between both in each layer to deliver up to nine times faster AI training and up to 30x faster AI inference speedups on large language models compared to the prior generation A100.

HBM3 memory subsystem

It is capable of providing nearly double the bandwidth of previous generations. The H100 SXM5 GPU is the world’s first GPU with HBM3 memory, delivering 3 TB/sec of memory bandwidth.

L2 cache architecture

The 50 MB L2 architecture caches large portions of models and datasets for repeated access, reducing trips to the HBM3 memory subsystem

Multi-instance GPU

The second-generation multi-instance GPU (MIG) technology provides approximately triple the compute capacity and nearly double the memory bandwidth per GPU Instance compared to the A100 chip. Confidential computing capability with MIG-level trusted execution environments (TEE) is also provided for the first time.

Confidential computing support

To protect user data, defend against hardware and software attacks, and better isolate and protect VMs from each other in virtualized and MIG environments, H100 implements confidential computing and extends the TEE with CPUs at the full PCIe line rate.

Nvidia NVLink

The fourth-generation Nvidia NVLink provides triple the bandwidth on all reduced operations and a 50% generation bandwidth increase over the third-generation NVLink.

NVSwitch technology

H100 GPUs introduce third-generation NVSwitch technology that includes switches residing both inside and outside of nodes to connect multiple GPUs in servers, clusters, and data center environments. Each NVSwitch inside a node provides 64 ports of fourth-generation NVLink links to accelerate multi-GPU connectivity. Total switch throughput increases to 13.6 Tbits/sec from 7.2 Tbits/sec in the prior generation. New third-generation NVSwitch technology also provides hardware acceleration for collective operations with multicast and NVIDIA SHARP in-network reductions.

NVLink Switch system

Based on the third-generation NVSwitch technology, a new NVLink switch system interconnects technology and new NVLink introduces address space isolation and protection, enabling up to 32 nodes or 256 GPUs to be connected over NVLink in a 2:1 tapered, fat tree topology.

PCIe Gen 5

It provides 128 GB/sec total bandwidth (64 GB/sec in each direction) compared with 64 GB/sec total bandwidth (32GB/sec in each direction) in Gen 4 PCIe. PCIe Gen 5 enables H100 to interface with x86 CPUs and SmartNICs / DPUs (Data Processing Units).

Product specifications for the H100 GPUs

H100 SXM

H100 PCIe

H100 NVL1

BFLOAT16 Tensor Core

1,979 teraFLOPS

1,513 teraFLOPS

3,958 teraFLOPs

Decoders

7 NVDEC

7 JPEG

7 NVDEC

7 JPEG

14 NVDEC

14 JPEG

Form factor

SXM

PCIe dual-slot air-cooled

2x PCIe dual-slot air-cooled

FP16 Tensor Core

1,979 teraFLOPS

1,513 teraFLOPS

3,958 teraFLOPs

FP32

67 teraFLOPS

51 teraFLOPS

134 teraFLOPs

History

Nvidia announced the Hopper Architecture and the H100 GPU (the first GPU based on the Hopper Architecture) on March 22, 2022. The architecture, named after US computer scientist Grace Hopper, succeeds the Nvidia Ampere architecture launched two years earlier. Upon the announcement, Nvidia stated the H100 would be available worldwide from leading cloud service providers and computer makers as well as directly from Nvidia later in 2022. CEO and founder Jenson Huang described the H100 in the announcement as:

The engine of the world's AI infrastructure that enterprises use to accelerate their AI-driven businesses.

On August 31, US officials stated it would stop Nvidia from exporting its top computing chips for AI work to China. The ban affected Nvidia's A100 and H100 GPUs and could affect the completion of the H100's development.

On September 20, 2022, Nvidia announced the H100 Tensor Core GPU was in full production with global tech partners planning to roll out the first wave of products and services based on the chips in October 2022. At the time of the announcement, H100 GPUs were accessible on Nvidia Launchpad and Dell PowerEdge servers. Customers could begin ordering NVIDIA DGX™ H100 systems. Computer manufacturers were expected to ship H100-powered systems in the following weeks, with over 50 server models on the market by the end of 2022. Manufacturers building systems included:

Atos
Cisco
Dell Technologies
Fujitsu
GIGABYTE
Hewlett Packard Enterprise
Lenovo
Supermicro

Higher education and research institutions would also be receiving H100 to power new supercomputers. These include the Barcelona Supercomputing Center, Los Alamos National Lab, Swiss National Supercomputing Centre (CSCS), Texas Advanced Computing Center, and the University of Tsukuba.

On March 21, 2023, Nvidia and its partners announced the availability of new products and services that include the H100 Tensor Core GPU. Oracle Cloud Infrastructure (OCI) announced the limited availability of new OCI Compute bare-metal GPU instances featuring H100 GPUs. Amazon Web Services announced upcoming EC2 UltraClusters of Amazon EC2 P5 instances. Microsoft Azure made a private preview announcement the previous week for its H100 virtual machine, ND H100 v5. Meta deployed its H100-powered Grand Teton AI supercomputer internally for its AI production and research teams. Organizations around the world receiving the first wave of DGX H100 systems included:

CyberAgent—A Japanese digital advertising and internet services company creating AI-produced digital ads and celebrity digital twin avatars
Johns Hopkins University Applied Physics Laboratory—The U.S.’s largest university-affiliated research center is using DGX H100 for training LLMs
KTH Royal Institute of Technology—European technical and engineering university based in Stockholm, using DGX H100 to provide computer science programs for higher education
Mitsui—A Japanese business group with a wide variety of businesses in fields such as energy, wellness, IT, and communication, began building Japan’s first generative AI supercomputer for drug discovery, powered by DGX H100
Telconet—A telecommunications provider in Ecuador building intelligent video analytics for safe cities and language services to support customers across Spanish dialects

On August 8, 2023, Nvidia unveiled its successor to the H100, the GH200 Grace Hopper Superchip. Reports from August 2023, stated Nvidia was planning to at least triple production of the H100 GPUs to match demand caused by the boom in AI workloads. Reports state Nvidia was hoping to ship 500,000 units in 2023, with the aim of shipping between 1.5 million and 2 million units in 2024.

Timeline

No Timeline data yet.

Further Resources

Title

Author

Link

Type

Date

NVIDIA H100 PCIe GPU

Product Brief

https://www.nvidia.com/content/dam/en-zz/Solutions/gtcs22/data-center/h100/PB-11133-001_v01.pdf

Product Brief

November, 2022

NVIDIA H100 Tensor Core GPU Architecture Overview

https://resources.nvidia.com/en-us-tensor-core/gtc22-whitepaper-hopper

Whitepaper

2023

NVIDIA H100 Tensor Core GPU Datasheet

https://resources.nvidia.com/en-us-tensor-core/nvidia-tensor-core-gpu-datasheet

Datasheet